≡ Menu

How to Fix Duplicate Data from Hibernate Queries

Java-PodcastWe’ve covered a lot of topics on Hibernate already on this blog in a nice sequential format. If you haven’t had the pleasure of going through all the tutorials, I’d suggest going to this page and having a flip through all the topics.

How to Fix Duplicate Data from Hibernate Queries

This problem was the bane of my existence when I first started using Hibernate because I had no idea where the problem was coming from.

If you’re executing a query and finding that you have a bunch of duplicate records and haven’t a clue why, then you’re in the right place.

You see the problem is typically caused by having left joins (or optional joins) in your objects. When you have a base object, like say User and it joins to another table/object in an optional One-to-Many or optional Many-to-Many format, then you may get duplicates.

Consider this scenario… A User objects joins to the LoginHistory object, which keeps track of all the times a particular User has logged into the system. And let’s say our user has logged in many times. You’ll have a situation where you have many records in the LoginHistory table.

So what happens when you run a query that joins to the LoginHistory table? Well it will return as many rows as there are entries for that User in the LoginHistory table.

So because of this, Hibernate doesn’t massage the data for you, it just returns exactly what it got from the database. The ball is in your court to tell Hibernate what to do with records it has retrieved.

There are two solutions to this problem:

  1. Declare your joining object as a Set
  2. Make use of Distinct Root Entity Results Transformer

The Problem at a Glance

So here’s an example of the problem in action. Below you’ll see an outline of the optional one-to-many join between the User class and the LoginHistory class.

User.java

import java.util.List;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;

@Entity
@Table(name="users")
public class User
{
  private Long userId;
  private String username;
  private String password;
  private List<LoginHistory> loginHistory;

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  @Column(name="user_id")
  public Long getUserId()
	{
		return userId;
	}
	public void setUserId(Long userId)
	{
		this.userId = userId;
	}
	public String getUsername()
	{
		return username;
	}
	public void setUsername(String username)
	{
		this.username = username;
	}
	public String getPassword()
	{
		return password;
	}
	public void setPassword(String password)
	{
		this.password = password;
	}
	@OneToMany(cascade=CascadeType.ALL, mappedBy="user", fetch=FetchType.EAGER)
	public List<LoginHistory> getLoginHistory()
	{
		return loginHistory;
	}
	public void setLoginHistory(List<LoginHistory> loginHistory)
	{
		this.loginHistory = loginHistory;
	}
	
  
}

LoginHistory.java

package com.howtoprogramwithjava.example.persistence;

import java.util.Date;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.ManyToOne;
import javax.persistence.Table;

@Entity
@Table(name="login_history")
public class LoginHistory
{
  private Long loginHistoryId;
  private Date loggedIn;
  private User user;
  
  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  @Column(name="login_history_id")
	public Long getLoginHistoryId()
	{
		return loginHistoryId;
	}
	public void setLoginHistoryId(Long loginHistoryId)
	{
		this.loginHistoryId = loginHistoryId;
	}
	
	@Column(name="logged_in")
	public Date getLoggedIn()
	{
		return loggedIn;
	}
	public void setLoggedIn(Date loggedIn)
	{
		this.loggedIn = loggedIn;
	}
	
	@ManyToOne(cascade=CascadeType.ALL, optional=true, fetch=FetchType.EAGER)
	@JoinColumn(name="user_id")
	public User getUser()
	{
		return user;
	}
	public void setUser(User user)
	{
		this.user = user;
	}
}

And then we have the entry in a DAO class that will query for Users by passing in a username.

  public List<User> getUserByUsername (String username)
  {
  	Session session = sessionFactory.getCurrentSession();
  	return session.createCriteria(User.class).add(Restrictions.eq("username", username)).list();
  }

So in the code directly above, we are querying the database for Users that have a username matching the String that’s being passed in.

The problem with this is that once it runs this query, it will return multiple rows if the User it finds has logged into the system more than once.

Again, it does this because it’s an optional (left) join. If you were to take a look at the query built by Hibernate, it could look something like this (assuming the username being passed in is tpage):

select * from users
left join login_history on login_history.user_id = users.user_id
where users.username = 'tpage';

This query returns multiple results (three results to be exact, in my database) as I’ve inserted three separate rows into the Login_history table that point back to the tpage user.

Alright, so hopefully you fully understand the problem, now let’s talk about some solutions to this problem.

Declaring the Join as a Set

On the parent side of the relationship, you’ll be declaring a collection of objects that embodies your *-to-Many relationship. What you’ll need to do here is to use a Set as the backing collection as opposed to something like a List (which is what we were using in the User class above).

Here’s an example of how to set it up correctly:

import java.util.Set;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;

@Entity
@Table(name="users")
public class User
{
  private Long userId;
  private String username;
  private String password;
  private Set<LoginHistory> loginHistory;

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  @Column(name="user_id")
  public Long getUserId()
	{
		return userId;
	}
	public void setUserId(Long userId)
	{
		this.userId = userId;
	}
	public String getUsername()
	{
		return username;
	}
	public void setUsername(String username)
	{
		this.username = username;
	}
	public String getPassword()
	{
		return password;
	}
	public void setPassword(String password)
	{
		this.password = password;
	}
	@OneToMany(cascade=CascadeType.ALL, mappedBy="user")
	public Set<LoginHistory> getLoginHistory()
	{
		return loginHistory;
	}
	public void setLoginHistory(Set<LoginHistory> loginHistory)
	{
		this.loginHistory = loginHistory;
	}
}

Again, the key is to use a Set as the collection of child entities. So in our example above we used Set<LoginHistory>

Distinct Root Entity Results Transformer

This sounds like pure gibberish, but thankfully it’s not hard to implement. All you need to do is make sure that you set a ResultsTransformer when you’re building your query.

Remember that DAO query we talked about near the beginning of this article? I’ll show you that query again for the sake of completion:

  public List<User> getUserByUsername (String username)
  {
  	Session session = sessionFactory.getCurrentSession();
  	return session.createCriteria(User.class)
        .add(Restrictions.eq("username", username))
        .list();
  }

This query is alright, but as we’ve seen, it doesn’t work well when we have an option join and we’re dealing with a List of entities.

In the event that we NEED to use a List of entities, we can alter this query to use the ResultsTransformer. Let’s see what the corrected code looks like:

  public List<User> getUserByUsername (String username)
  {
  	Session session = sessionFactory.getCurrentSession();
  	return session.createCriteria(User.class)
        .add(Restrictions.eq("username", username))
        .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
        .list();
  }

You see the difference? We just added one additional line to the query: setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) and voila! You will no longer have duplicates in your returned result set.

{ 29 comments… add one }
  • Cheng September 24, 2014, 3:20 pm

    As for the query, it only queries the users table, why does it has to join login_history?

    • Trevor Page September 29, 2014, 9:01 am

      It joins to the login_history table because both are using the Eager Fetch type and there’s a relationship defined between the two tables in the class files.

  • Heaven42 January 21, 2015, 9:25 am

    Thank you so much.

  • Leandro Bortolotto September 21, 2015, 3:30 pm

    Awesome! Thanks for sharing. It worked for me.

  • Mounir BOUSSETTA November 20, 2015, 11:12 am

    Thanks for this helpful post.

  • David Kihato January 12, 2016, 7:24 am

    What happens if say we have millions of LoginHistory data in our database, using setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY), won’t it result to an OutOfMemoryException since all the data is loaded to memory?

    • Trevor Page January 18, 2016, 9:06 am

      This is why I prefer to use the method of declaring my Hibernate Collections as Sets instead of Lists.

  • Praveen March 14, 2016, 2:11 pm

    wow. It worked. thanks for the post.

  • ano March 31, 2016, 12:02 pm

    Your 2nd comment was the solution for me: changing from Eager Fetch -> Lazy Fetch ?

  • Biagio April 29, 2016, 11:35 am

    Hi,
    thanks for your fantastic tutorial. I want to ask you a question.
    I didn’t understand the first solution about the use of List and Set.
    In your first example you use List and you use a OneToMany relation with fetch option set to EAGER. So when you make a query to User will be loaded the LoginHistory as well. In my opinion is this the reason why we get duplicated.
    Then you change the relation to LAZY so when we load User object, LoginHistory won’t be loaded. Could you give me more explanation about it!
    Thanks!

  • David June 23, 2016, 6:54 am

    Thanks for the post! Very useful!

  • Chris August 4, 2016, 4:07 pm

    Thanks for this article. Very helpful. I couldn’t work out why my spring boot app was returning dupe objects. Using the set worked nicely.

  • cslotty August 16, 2016, 11:45 am

    You never want to load millions of records into your application! That’s why you’ll have to use sensible criteria conditions, in order to limit the resulting data.

  • Vishwajeet November 25, 2016, 4:49 am

    Thank you so much…
    I was fed up with finding this issue.

  • Anonymous March 20, 2017, 3:16 am

    jkl;’akjkf’a;kff;;lda

  • Ganga Tamang May 21, 2017, 9:06 pm

    How do we return the List of entire table object instead of List of table column/s when using Projection or multiple projections which also involves (lets say three tables) ??

  • Khomeni June 7, 2017, 11:09 am

    Thanks for useful post. It works form me.

  • music promote June 29, 2017, 9:39 pm

    Hiуa! Quick question that’ѕ completely off
    topic. Do you know how to make yoսr site mobile friendly?
    My weƄsite looks weird when browsing from my iphone 4.
    I’m trүing to fіnd a template or plugin that miǥht be able to resolve
    this issue. If уou have any suɡgestions, pleɑsᥱ share.
    Thank you!

  • Fernando July 1, 2017, 11:55 pm

    I dont know who you are, but I FUCKING LOVE YOU.

  • Garry September 15, 2017, 3:20 pm

    Okay. Your example is fine for a simple case. Now, let us up the ante. Let’s say that I add a column called ‘source’ to the login_history table. I want the source of the login because I want to see how often users log in from certain devices with different IP addresses. Now, let us take the scenario of a data breach. I know the IP of 10.2.14.243 was used to breach my system. I want to find out who was using it and when. So I search the user table, but again, I am only interested in the users who had devices with an IP address of 10.2.14.243. Using the join, I get a query something like “from user left join fetch user.userLogins where user.id in (select userId from userLogin where device.ipAddress = ‘10.2.14.243’)” which gives a list of all user data for any user who ever used 10.2.14.243 as his IP (including data from other IP addresses he or she may have used) which is not what we want. We only want those logins where the specified IP address was used. How do we restrict our results?

  • Binh Thanh Nguyen March 23, 2018, 2:44 am

    Thanks, nice tips

  • susan April 20, 2018, 4:07 pm

    Really helpful

  • Mohamed May 8, 2018, 7:27 am

    Very helpful post, thank you.
    An other solution was to put this annotation the jointable
    @Fetch(value = FetchMode.SUBSELECT)

  • santosh November 12, 2018, 5:58 am

    Very helpful ,
    No need to List to Set of collection for duplicate if in criteria put .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) it remove duplicates.

  • RJ January 28, 2019, 11:18 am

    What about when you are not performing any joins/do not have any many-to-one or many-to-many relationships?

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.