Hibernate – Creating Data Access Objects (DAOs)

by Trevor Page on July 15, 2014


Data Access Objects – What are they?

Data Access Objects (or DAOs for short) are used as a direct line of connection and communication with our database. DAOs are used when the actual CRUD (CRUD = Create, Read, Update, Delete) operations are needed and invoked in our Java code. These data access objects also represent the “data layer” of our application.

These objects are still just plain old Java objects that incorporate the use of some Hibernate annotations to give them the functionality we need from them. Again, that functionality being the communication with the database.

Also, believe it or not, the concept of creating a file specifically for accessing the database is a design pattern. It’s called the Data Access Object Pattern and you can read more about it from this nice short Wiki article.

What Hibernate Annotations Should We Use?

Okay, so now that you’re somewhat familiar with DAOs, it’s time to learn how to integrate these plain old Java objects with our Hibernate framework (so they actually do the work we need them to do).

There are two main annotations that you need to be familiar with:

  • @Repository
  • @Transactional

The first of the two annotations @Repository is one from Spring and it’s pretty straight-forward. It’s used to mark the Java file as something Spring calls a “Component”, which enables it to be scanned and incorporated into Spring’s code. You really just need to remember to put this annotation in all of your DAOs (on the class level) and you’ll be good to go.

The second (and more complicated) annotation is the @Transactional annotation. This annotation is used as a means to enable transaction management within your Java DAO file.

Transaction management is a vast (and complex) topic. I’ll give you a brief overview of what it is and why it’s useful.

Transaction management is all about data integrity… which means it’s all about keeping your data valid, non-corrupt and error free: three things that are very handy! Let’s think of the opposite scenario, having our data be invalid, corrupt and full of errors… does that last sentence make you wince? Well, if it does, then transaction management is the vaccine you need.

What is a Transaction?

A transaction (as it pertains to databases) is the trick we use to be able to roll back (or revert) a database operation. Consider the following database operation:

insert into users (username, password) values ('trevor_page', 'password**!');

This database operation has to do a few things:

  1. Insert a new row of data into the users table
  2. Assign a primary key to this new row
  3. Populate the new row with the provided data

Fairly straight forward stuff, but what happens if your database server crashed after step #1? Or what happens if someone yanks the power cord out of the server by accident after step #1? What happens if the disk drive becomes full after step #2?

All of these scenarios would leave the database operation only “partly committed”. If we were to leave the data as is, in its partially committed state, then our data would be corrupt and could lead to errors in our application. NOT COOL!

Okay, so now let’s consider what I’ve said about a database transaction. Its job is to “roll back” the database operation if it doesn’t complete correctly. So let’s assume that we open a transaction, start our database operation, get to step #2 and then there’s a failure. The database keeps track of the “state” its in and (once restored) sees that there’s a transaction that was not fully committed, so it will roll back to the previously known “stable state”… thus eliminating the partially committed (and invalid) data.


So you see why transactions are so useful? They will save your butt when the unexpected becomes a reality… now there’s a whole lot more to understand about transactions, but that’s outside of the scope of this conversation, so I’ll get back to the topic at hand.

Example of a Real Data Access Object in Java

Okay, so now it’s time for the big reveal! Let’s take a look at an example DAO class that I’ve created.

We’ve been talking about using the example of an Address Book program, so let’s take a look at what the beginning of our Address Book DAO would look like:

import org.springframework.stereotype.Repository;
import org.springframework.transaction.annotation.Transactional;

public class AddressBookDao
  // empty class... for now!

As you can see above, we have put both of the annotations that we’ve just learned about at the class level for the DAO.

Pretty straight forward right?

Now let’s talk about the SessionFactory

What the heck is a SessionFactory?

A SessionFactory is what we use to grab individual Sessions so that we can talk to our database. It uses the Factory design pattern, which essentially means it’s sole purpose is to dish our Sessions whenever you ask for them.

So the real question here is, what the heck is a Session?!

A Session is what we use to create open our initial transaction, perform some kind of database operation, and then commit the transaction (or roll it back if there was a failure).

Let’s take a look at how we get a Session in Java:

import org.springframework.stereotype.Repository;
import org.springframework.transaction.annotation.Transactional;

public class AddressBookDao
  SessionFactory sessionFactory;
   * Save an AddressBook object to the database
  public void save(AddressBook addressBook)
  	Session currentSession = sessionFactory.getCurrentSession();

Now things are starting to get a little more interesting!

You see that we used the @Autowired annotation to automatically populate the sessionFactory with an actual real instantiated instance. Spring can do this because we’ve already configured the SessionFactory in our PersistenceConfig class. If you don’t remember what that file is, you can read about it in this post.

Once we have a real instance of the SessionFactory we can use it to grab a real Session. We do so by invoking the getCurrentSession() method and assign it to a variable.

Now that we have a Session we can invoke one of our CRUD operations. In the example above, I am invoking a saveOrUpdate(), which will either create a new row of data or update an existing row.

Hibernate’s saveOrUpdate Method

This is a very important method to understand, as it incorporates one of the fundamental rules by which Hibernate operates.

You see, how the heck is Hibernate supposed to know whether or not you wish to create a new row of data with the object you’ve given to it, or if it should update an existing row of data?

Let’s assume you’ve created a brand new AddressBook object and populated it with some relevant information. Now you wish to save it to the database… YOU know that it’s a new piece of data, and therefore it should have a new row created in the database… but how does Hibernate know?

Hibernate looks at the @Id variable that you’ve defined in the AddressBook class! If there’s an actual ID value assigned to your object, then that means it must already exist in the database and therefore this operation should be an UPDATE… if there’s no ID valued populated in the AddressBook object (i.e. it’s null), then it should be a new row and it performs an INSERT.

So let’s say we have the following AddressBook scenario:

AddressBook ab = new AddressBook();
ab.setName("Trevor Page");
ab.setStreetAddress("123 Fake St");
ab.setZipCode("M1M 3M2);

What do you think will happen when we invoke the save(ab) method?

If you look carefully, you’ll see that I’ve assigned a value to the ID field… so given my explanation above, that means that Hibernate will perform an UPDATE.

Now, what happens if we get rid of the line of code that assigns the ID?

AddressBook ab = new AddressBook();
ab.setName("Trevor Page");
ab.setStreetAddress("123 Fake St");
ab.setZipCode("M1M 3M2);

Now when we run the save(ab) method, hibernate sees that there’s no ID set, so it will perform an INSERT.

The functionality described above is critical to understand!

How do we Read from the Database with Hibernate?

Another great question, here we will make use of something called Criteria. Criteria is the object that we will use to create SQL queries in Java… if you like you can also create native SQL queries with Hibernate, but it’s actually a LOT easier to use Criterias.

In the example below, you will see how we create the Criteria to search the address_book table for the name that we will pass in:

public List getAddressBookByName (String name)
  	Session currentSession = sessionFactory.getCurrentSession();
  	List list = currentSession.createCriteria(AddressBook.class)
                             .add(Restrictions.eq("name", name))
  	return list;

We use a Session to create the Criteria and we add Restrictions to filter our results.

To illustrate how this works, let’s assume we just use this Criteria:

List list = currentSession.createCriteria(AddressBook.class).list();

This will create the following SQL query:

select * from address_book;

This is a very broad search, and thus we need to add a filter on it to make it more narrow. Remember that our current mission is to retrieve a particular person’s address information. We do this by adding Restrictions, and in our example we restrict the broad results by searching for a name.

We then invoke the list() method which will actually execute the query and return a list of results. If there are no matching entries then it will be an empty list, if there are one or more entries, then they will be populated in our list.

Easy peasy.

There is lots more we can do with the Criteria object in Hibernate, such as grouping and ordering our data… but that’s a topic for discussion later.

How do we Delete Rows from the Database?

Our final topic will be about deleting rows.

The only thing you need to know here is that you actually need to LOAD the row from the database before you can delete it. This is something built into the Hibernate framework and it’s another concept that is important to understand. If you simply just create your own object manually and then pass it in to a delete method, hibernate won’t be able to carry out the operation as the object you’ve asked to be deleted is “detached”.

There’s some great information on the “state” of objects as they pertain to the Hibernate framework in this tutorial from JBoss. It’s a bit of a heavy read, but it’s definitely good to understand how this stuff works.

In any case, the point here is that you need to make sure the object that you wish to delete is the actual persistent version of the object. So this means you’ll need to first load the object from the database before invoking the delete. Here’s how to do it:

public void deleteById (Long id)
  	Session currentSession = sessionFactory.getCurrentSession();
  	AddressBook result = (AddressBook) currentSession.createCriteria(AddressBook.class)
  	if (result != null)

The only curve-ball in this code is the use of the uniqueResult() method. Previously when we read data from the database, we used the list() method to invoke our query, but here we have a slightly different scenario. We are loading the object by the ID, so we know there is only going to be ONE result back from the database… since we know we are only getting one object back, we can ask for a uniqueResult as opposed to getting list.

And that’s really all there is to it in the simple cases.

You now know how to Create, Read, Update and Delete data using Hibernate! Congrats!

If you liked this tutorial, I’d highly recommend jumping on my mailing list. I will keep you up to date with all the need to know information related to Java programming, plus you’ll get access to a list of the top 7 tools that Java professionals use every day, as well as invitations to all of my free webinars that I host where I sit down with you and teach important subjects in the Java world.

To join the mailing list just fill out your email in the popup below or navigate to this sign up page!


Hibernate @Entity – Mapping Your Java Beans

by Trevor Page on July 8, 2014

Java-PodcastNow that you’ve learned how to set up your first Hibernate Java project it’s time to put that fresh new configuration to work!

What You’ll Learn

The focus of this podcast / blog post is to teach you how to create the connection between your Java objects and the SQL Database tables. Remember, the whole point of the Hibernate framework is for you to be able to write Java code that allows you to communicate with your database.

When I say that the goal is to communicate with your database, what I mean is:

  1. Create the tables (if they’re not already there) that will represent your Java objects
  2. Give Java the ability to send queries and execute scripts on your database
  3. Establish a transactional means by which to perform CRUD operations

What we’re going to be tackling in this post will the step #1, how to get Hibernate to create our tables for us.

How to get Hibernate to Create a Table

So the key to getting Hibernate to create a table for us is to understand how Hibernate makes the decision to create a table. It works by scanning a particular package that you’ve specified in your configuration file (via sessionFactory.setPackagesToScan(). Hibernate will scan that package for any Java objects annotated with the @Entity annotation. If it finds any, then it will begin the process of looking through that particular Java object to recreate it as a table in your database!

Before Hibernate will start it’s scan, not only do we need to invoke the sessionFactory.setPackagesToScan code, but we also need to enable the scanning process that the Spring framework handles. We’ll need to make a quick tweak in our applicationConfig.xml file. Take a look at mine here:



The magic line of code above is the <context:component-scan base-package="com.howtoprogramwithjava.example"> line. This is what Spring uses to know where to scan for annotations related to both the Spring framework and the Hibernate framework.

Unfortunately that’s not the only step (if only it were that easy!). We still need to specify a couple of things:

  1. The primary key for the table
  2. How to generate the primary keys

For both items above, we’ll use annotations.

The first annotation (@Id) is simple and it’s used to “mark” the primary key for the table, the second annotation is @GeneratedValue and it will be used to specify HOW to generate primary keys. Let’s take a look at an actual example of all these annotations in action:

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

public class AddressBook
  private Long id;
  private String name;
  private String phoneNumber;
  private String streetAddress;
  private String zipCode;
  private String city;
  private String region;
  private String country;
	public Long getId()
		return id;
	public void setId(Long id)
		this.id = id;
	public String getName()
		return name;
	public void setName(String name)
		this.name = name;
	public String getPhoneNumber()
		return phoneNumber;
	public void setPhoneNumber(String phoneNumber)
		this.phoneNumber = phoneNumber;
	public String getStreetAddress()
		return streetAddress;
	public void setStreetAddress(String streetAddress)
		this.streetAddress = streetAddress;
	public String getZipCode()
		return zipCode;
	public void setZipCode(String zipCode)
		this.zipCode = zipCode;
	public String getCity()
		return city;
	public void setCity(String city)
		this.city = city;
	public String getRegion()
		return region;
	public void setRegion(String region)
		this.region = region;
	public String getCountry()
		return country;
	public void setCountry(String country)
		this.country = country;

As you can see above, the @Entity annotation is used at the class level, but the @Id and @GeneratedValue annotations are used at the method level.

It’s important to note the exact location of the @Id and @GeneratedValue annotations: just above the getter method for the id instance variable. If you were to put these annotations on the setter method, it won’t work as Hibernate only looks at the getter methods for these annotations.

Final Touches

The last thing I should mention is the naming convention for tables and columns with Hibernate. By default Hibernate doesn’t know how to insert underscores (_) into the names of tables and columns. So we’ll need to help Hibernate out by telling it how we want to name our tables and columns.

When we have any tables or columns names that are more than one word in length, we’ll need to use one of two annotations:

  1. @Table(name="")
  2. @Column(name="")

For table names, just use the @Table annotation and specify the actual name that you’d like to use for the table inside of it. This annotation should only be used at the class level.

For column names, just use the @Column annotation and specify the actual column name that you’d like to use inside of it. This annotation should only be used at method level (again, just above the getter method).

For those of you who want to see this entire process in action, just check out the video below which will take you step by step through everything you’ve just learned:


Hibernate Persistence for Beginners

by Trevor Page on July 3, 2014

Java-PodcastLadies and Gentlemen! We have finally hit a milestone in our journey of learning Java, we are now done talking specifically about SQL and are moving on to talk about persistence frameworks.

Let’s start with the most obvious question first…

What is a Persistence Framework?

As the name implies, it has something to do with persisting things… this means that we’re still talking about databases. But what is it that we are persisting with this framework?

Objects (of course)

A persistence framework is something we use in Java to bridge the gap between Java and SQL.

Hopefully we’ve gone through all of my previous tutorials and have learned a thing or two about Java and SQL. But one thing I haven’t taught you yet, is how to put your knowledge of SQL into Java.

That’s what I’ll be teaching you throughout these Hibernate (persistence framework) tutorials.

And if you haven’t guessed it already, Hibernate is a persistence framework that you can use in Java. It’s what allows you to write Java code (staying true to Object Oriented programming practices) and yet still be able to communicate with your database. Cool eh?

Why use Hibernate?

Overall, Hibernate is widely adopted and is open source software. This means that you’ll have very readily accessible help documentation across the internet (just like the one you’re reading right now). The more users a particular project/framework has, the more overall support you’ll be able to receive.

Hibernate is no exception to this rule, there are hundreds (if not thousands) of blogs dedicated to Hibernate and Hibernate tutorials.

Not only that, Hibernate is a fairly simple and flexible framework to use when compared to other alternatives out there. You’ll be able to use Hibernate with pretty much any database system you plan on using (including the big ones like MySQL, MS SQL Server, Oracle, PostgreSQL).

Hibernate also integrates and plays nice with the Spring Framework. which is another framework that you’re hopefully already familiar with. If you’re not, then I’d suggest reading up on it.

What you need to know when using Hibernate

There are some pre-requisites for learning Hibernate. I’m going to be progressing through these Hibernate tutorials assuming that you already have at least a beginners grasp on the following languages / technologies:

  1. Java
  2. Spring Framework
  3. SQL

If you don’t already understand the things listed above, then you’re going to have to pause your learning on Hibernate, because you’ll likely get lost in all the details.

If you have a good understanding of those concepts / technologies, then you’re almost good to go. We’ll need to talk a little bit about a technology called Maven before we can continue.

You see, Maven was created by Apache with the goal of simplifying the annoying project management aspect of programming (namely the build process and library management). As a software project grows, you’re more likely to need to depend on 3rd party software / library (like Spring and Hibernate) to get the job done right.

Getting Spring and Hibernate set up correctly and running is actually a VERY annoying and somewhat complex task… especially since there are so many bloody versions of each framework. This is where Maven comes in, it allows you to MUCH more easily point to the version of each framework that you want to use and then “install it”.

Maven uses something called a “project object model” to manage your projects… this is just a fancy way of saying, they have a single XML file where you will specify all the information about what frameworks you’d like your project to use. It’s not too difficult to use once you get used to it.

The goal of these tutorials, however, is not to teach you the ins and outs of Maven, but rather to just show you the basics of what you need to know about Maven to get up and running smoothly with Hibernate (and Spring).

Okay, Let’s Setup Hibernate

The easiest way for me to show you how to go about setting up Hibernate is to do it in a video, so I’ve recorded one and have included it below for your viewing enjoyment!

Please watch the video above for a full explanation of how to set up your project.

Below I’ve included the source code I used to create the two Hibernate specific set up files.

Hibernate Configuration Files

There are two files that you’ll need to create to get the ball rolling with your Hibernate set up.

  1. PersistenceConfig.java
  2. persistence-mysql.properties

Note: The actual names of these files doesn’t actually matter, what matters is that one is a Java file that will be used to configure your Hibernate framework and a properties file.


import java.util.Properties;

import javax.sql.DataSource;

import org.apache.tomcat.dbcp.dbcp.BasicDataSource;
import org.hibernate.SessionFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.PropertySource;
import org.springframework.core.env.Environment;
import org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor;
import org.springframework.orm.hibernate4.HibernateTransactionManager;
import org.springframework.orm.hibernate4.LocalSessionFactoryBean;
import org.springframework.transaction.annotation.EnableTransactionManagement;

@PropertySource({ "classpath:persistence-mysql.properties" })
@ComponentScan({ "com.howtoprogramwithjava.example" })
public class PersistenceConfig
  private Environment env;

  public LocalSessionFactoryBean sessionFactory() {
     LocalSessionFactoryBean sessionFactory = new LocalSessionFactoryBean();
     sessionFactory.setPackagesToScan(new String[] { "com.howtoprogramwithjava.example" });

     return sessionFactory;

  public DataSource restDataSource() {
     BasicDataSource dataSource = new BasicDataSource();

     return dataSource;

  public HibernateTransactionManager transactionManager(SessionFactory sessionFactory) {
     HibernateTransactionManager txManager = new HibernateTransactionManager();

     return txManager;

  public PersistenceExceptionTranslationPostProcessor exceptionTranslation() {
     return new PersistenceExceptionTranslationPostProcessor();

  Properties hibernateProperties() {
     return new Properties() {
           setProperty("hibernate.hbm2ddl.auto", env.getProperty("hibernate.hbm2ddl.auto"));
           setProperty("hibernate.dialect", env.getProperty("hibernate.dialect"));
           setProperty("hibernate.globally_quoted_identifiers", "true");


# jdbc.X
# hibernate.X

If you’re curious as to where you should create these files, you can just watch the video above and you’ll see exactly where they get created.


SQL Subquery

by Trevor Page on June 25, 2014


The SQL Subquery

Now that you’ve learned about SQL Joins, aggregate functions and the group by keyword, it’s time we moved on to our final topic in our SQL tutorial series. Today you’ll be learning all about SQL Subqueries, how to use them and when you should use a SQL subquery.

So, what is a subquery?

First and foremost, let’s get the jargon out of the way. A subquery can also be referred to as a nested query. It’s just like having a nested if statement in your Java code. Essentially what you’re doing with a subquery is you are constructing a regular old query (select statement) which could be run all by itself if you wanted to, but instead of running it all by itself, you’re jamming it into another query (select statement) to give you more specific (filtered) results.

What’s very important to note here is that the SQL subquery can almost always be re-written as a join with a where clause attached to it.

The advantage to using a subquery is that they tend to be easier to write and thus easier to understand if you’re coming in as a new coder looking at an existing system. Of course, as is always universally true, when you have a piece of code that’s easier to write and understand, it means that it’s likely also less efficient. Unfortunately that statement holds true here as well. You can easily get carried away with SQL subqueries and start to write very inefficient scripts.

The rule I always live by is to write it so it works first, then refactor it to run faster and be more efficient. This is the test driven development mantra :)

What does a SQL Subquery look like?

I love to teach by example, so let’s take a look at one.

Let’s say we’re working with a database table that deals with people and addresses. A Person has a name and an Address has all the pertinent fields associated with it like so:

drop table if exists address;
drop table if exists person;

create table person
  person_id int(11) auto_increment primary key,
  first_name varchar(20),
  last_name varchar(20)

create table address
  address_id int(11) auto_increment primary key,
  person_id int(11),
  address_line_1 varchar(100),
  address_line_2 varchar(50),
  city varchar(50),
  region varchar(50),
  zipcode varchar(10),
  country varchar(50),
  foreign key (person_id) references person (person_id)

insert into person (first_name, last_name) values ('Trevor', 'Page');
insert into person (first_name, last_name) values ('Jane', 'Doe');
insert into person (first_name, last_name) values ('Jack', 'Johnson');
insert into person (first_name, last_name) values ('Jack', 'Black');
insert into person (first_name, last_name) values ('Dave', 'Matthews');
insert into person (first_name, last_name) values ('John', 'Doe');

insert into address (person_id, address_line_1, city, zipcode, region, country) values (1, '16 Dunwatson Dr', 'Toronto', 'Ontario', 'M1C 3L8' ,'Canada');
insert into address (person_id, address_line_1, city, zipcode, region, country) values (2, '1129 Bannock St', 'Denver', 'CO', '80204' ,'USA');
insert into address (person_id, address_line_1, city, region, country) values (3, 'Serna 307', 'Santa Ana', 'SON' ,'Mexico');
insert into address (person_id, address_line_1, city, zipcode, region, country) values (4, '462 Duke St', 'Glasgow City', 'G31 1QN', 'Glasgow','UK');
insert into address (person_id, address_line_1, address_line_2, city, zipcode, region, country) values (5, '1584 Fischer Hallman St', 'Kitchener', 'Unit 301', 'N2R 1C1', 'Ontario' ,'Canada');
insert into address (person_id, address_line_1, city, zipcode, region, country) values (6, '155 Northampton St', 'Rochester', 'New York', '14606' ,'USA');

Okay, so that script will create our database tables and populate it with some random information… Now onto the fun stuff, the subqueries!

From the information stored in our database, let’s say we want to isolate the people that live in certain countries. Let’s say that we only want to see people who live in Canada, what would that query look like?

Here’s the big reveal!

select * From person where person_id in (select person_id from address where country = 'Canada');

Fairly unexciting stuff. What you should note here is the use of the in keyword.

The in keyword is one of the main keywords that we can use to create our subqueries.

How does the SQL Subquery work?

Let’s breakdown the query above: we are searching for any Person who lives in Canada. This is accomplished by writing two individual select statements:

  1. select * from person
  2. select person_id from address where country = 'Canada'

We then “join” these two queries together using the in keyword (thus utilizing the subquery structure).

Again, we want to know WHO lives in WHICH country. So that’s why were have one query looking at the Person table and one query looking at the Address table. Make sense?

The next important thing to note about this subquery is the specific use of the person_id key. You’ll notice that we said where person_id in (select person_id from ...), when you’re constructing subqueries and using the in keyword, you need to make sure that you’re matching IDs together (just like if you were doing a join statement).

Actually, speaking of joins, how about I show you what the equivalent query would look like as a join instead of a subquery?

select person.* from person
join address on address.person_id = person.person_id
where address.country = 'Canada';

As I mentioned at the beginning of this post, you’re usually able to write out a typical subquery as a join too, but sometimes the ‘join’ query can get a bit confusing. If you were to run the script directly above this paragraph, you’ll see that the results are the same as the subquery script we were looking at earlier.

You should see this as a good thing, because if you can understand the ‘join’ script above, then you should understand how the subquery works. The ‘join’ statement invokes a join based on the person_id from both the Person and the Address tables… which is exactly what our subquery does using the in keyword… ie. select * from person where person_id in (select person_id from address)

Comparing SQL Subqueries to Joins

For the sake of clarity, how about I write the same query in both formats (as a join and as a subquery) and highlight which parts are functionally equivalent?

Sounds like a great idea, here you go:

select * from person
where person_id in
(select person_id from address

where country = ‘Canada’);

select person.* from person
join address on address.person_id = person.person_id
where address.country = ‘Canada’;

So as you can see above, the black parts are where we choose what we actually want to see once all the filtering is done. The blue parts are used to join the two tables together, and the red parts are what we use to filter our result set.

Multiple Subqueries

Alright, so hopefully you get the basic idea of how subqueries work… now how to we use multiple subqueries in the same query?

Well, let’s say we want to know all the people who live in North America (i.e. USA and Canada combined). How would we go about doing this? We could just use a subquery inside of a subquery like so:

-- people who live in North America
select * From person where person_id in 
  (select person_id from address where country in ('Canada', 'USA'));

Note the use of two different in keywords. This means that we have two subqueries running in one script, they are just nested within each other. You can nest your subqueries many times over (I think up to 16 or something ridiculous like that)… but with great power comes great responsibility. The deeper you go with nested subqueries, the more strain you could be putting on the database system. Of course, this kind of strain is all relative, because if you have a very small dataset, then you can likely get away with inefficient scripts and not have to pay the price. Typically what happens is that you do your best to write great code/scripts, and as your system grows, the inefficient pieces will make themselves known and you’ll fix them as they come up… not the greatest approach, but that’s just how the real world works.

Other Types of Subqueries

So we’ve talked a lot about using the in keyword, but there are others that can be used… for example, the not in keyword can be helpful.

What if you want to find all the people who live outside of North America?

-- people who live outside of North America
select * From person where person_id in 
  (select person_id from address where country not in ('Canada', 'USA'));

Not the use of the not in keyword in this query… this will find all address rows that are not associated with Canada or USA, then it will join with the Person table via the person_id and return their names.

Easy peasy!

There are a few other examples of subqueries, but I don’t tend to use them very often (if at all), but if you’re interested in learning more, there’s a great article on subqueries here for your reading pleasure. This article focuses on MySQL (just like mine does) and will give you a brief overview on all the different ways to write subqueries in MySQL.

And finally, as always, I would love for you to leave a rating and review for this podcast (it’s one of the best ways to help spread the word about this great content) by visiting this link: howtoprogramwithjava.com/love

Java Podcast


SQL Group By

by Trevor Page on June 16, 2014


SQL Group By

After having talked about all the SQL Aggregate functions, there’s one more topic that goes hand in hand with what we’ve already learned… The group by keyword.

This particular keyword allows us to take a bunch of data and mash it all together into matching groups and then perform aggregate functions on those groups (like sum and avg).

You might ask yourself why you’d want to “mash together” a bunch of data. The answer to this is best explained with an example, but let me try to put it in regular words before we jump into our example. Grouping data together allows us to look at aggregate data in relation to unique piece of data (or rows), a typical use case would be to group all the matching data together so you can get a count of the number of occurrences of specific data. An example related to grouping and counting could be a presidential election, you’ll have all the votes in a database and you’ll want to group that data together to get the total votes for each unique candidate.

Grouping and counting is a very simple use case for the group by keyword, so the example that we’ll be looking at in this post will be related to bank accounts and transactions. With bank accounts and transactions within those bank accounts, we can start to use the more complex grouping methodologies (which we’ll be talking about later in this post).

Why use SQL Group By?

Grouping data together is extremely valuable in many ways. You’ll see a few examples here that relate to bank accounts and transactions within those bank accounts.

Let’s assume we have three different bank accounts and each has both credit and debit transactions. We would also like to see a quick summary of all the credits and all the debits for each bank account. This kind of “requirement” lends itself perfectly for grouping.

So let’s first set up our example case in our database. We’ll create the tables and columns as follows:

create table bank_account
  bank_account_id int(11) auto_increment primary key,
  bank_account_number varchar(20)

create table transaction
  transaction_id int(11) auto_increment primary key,
  bank_account_id int(11),
  transaction_date date,
  debit_amount numeric(10,2) default 0.0,
  credit_amount numeric(10,2) default 0.0

Okay, so with our two tables created, now we need to input some test data to start demonstrating how the SQL group by keyword works.

insert into bank_account (bank_account_number) values ('A-283748293'); -- this will become bank_account_id 1
insert into bank_account (bank_account_number) values ('B-174984638'); -- this will become bank_account_id 2
insert into bank_account (bank_account_number) values ('C-927461738'); -- this will become bank_account_id 3

insert into transaction (bank_account_id, credit_amount, transaction_date) values (1, 250.12, '2014-06-10');
insert into transaction (bank_account_id, debit_amount, transaction_date) values (2, -123.93, '2014-06-10');
insert into transaction (bank_account_id, credit_amount, transaction_date) values (3, 832.11, '2014-06-11');
insert into transaction (bank_account_id, debit_amount, transaction_date) values (1, -100.32, '2014-06-10');
insert into transaction (bank_account_id, credit_amount, transaction_date) values (2, 322.33, '2014-06-11');
insert into transaction (bank_account_id, credit_amount, transaction_date) values (3, 131.92, '2014-06-11');
insert into transaction (bank_account_id, credit_amount, transaction_date) values (1, 142.50, '2014-06-12');

What the sql script above has done is just inserted three bank accounts into the bank_account table, and then it inserted a few transactions (both debit and credit) into each of the three bank accounts.

Alright, so with all of this data, it’s easy to get a bit lost in the details of what’s going on, right? Wouldn’t it be nice if we could just get a breakdown of the balance of each account, based on all the transactions in our database table?

Well, that’s actually pretty simple to do, as you would just perform a SUM operation on the credit_amount and debit_amount columns. You’ve already learned how to do this in the last SQL tutorial lesson on aggregate functions. But we also need to add the grouping logic in there to see the balance of each account.

Before we do that, let’s take a look at how the data looks “as is” without grouping anything together. We’ll execute a simple select statement on the transaction table:

select * from transaction;

After running this query, here’s what the resulting values should look like in the transaction table:

Sql Group By 1

Do you see how the data is stored in the transaction table? Each individual transaction is given a date, it’s associated with a specific bank account, and it’s either marked as a debit or a credit based on which of the two columns are filled in. If it’s a credit transaction, then the corresponding credit_amount column will have a positive dollar amount inserted, also the debit_amount column is set to 0.0 by default (as is seen in the table’s definition where we set a default value for both the credit_amount and debit_amount)

But, like I said, this data is a bit too granular for our liking, so let’s take a look at how we can group it together. Let’s group this data by bank account number by using the group by keyword:

select b.bank_account_number, sum(t.debit_amount), sum(t.credit_amount) from bank_account b
join transaction t on t.bank_account_id = b.bank_account_id
group by b.bank_account_number;

After running this SQL script, here’s what the resulting data looks like from the select:

SQL Group By Data

***CRITICALLY IMPORTANT THINGS to note about the SQL script above are***:

  1. Since we are grouping by bank_account_number we are allowed to put it in the list of columns to select from
  2. Since we are using the group by keyword, we’re also allowed to use the aggregate functions in our list of columns to select from
  3. We should not select any columns that aren’t in the group by list of columns

What this means is that because I’ve stated to group by bank_account_number, this means I should not also show values for something like the transaction_date as the value displayed won’t necessarily make sense.

Grouping By Other Columns

Let’s say that we’re not interested in knowing the debits and credits for each bank account, let’s say that we’re only interested in knowing the total of both the credits and debits for any given date. How would we accomplish this?

Well, there are two requirements that need to be address in that previous statement. We’ll need to:

  1. Group by transaction_date
  2. Add up the credits with the debits for each individual transaction_date

So, let’s follow through on our new requirements with this sql script:

select t.transaction_date, sum(t.credit_amount + t.debit_amount) from bank_account b
join transaction t on t.bank_account_id = b.bank_account_id
group by t.transaction_date;

You should note that we changed the group by to be t.transaction_date (instead of b.bank_account_number) and we’ve also used the plus (+) symbol inside of our SUM aggregate function to add up the debits and credits.

Okay, so what do the results of this query look like?

SQL Group By Example 3

So as you can see here, our database has taken all the data in question, grouped it together by the transaction dates that match, and then once it had them all grouped into their proper “buckets” it performs the aggregate SUM function to give us our final result.

How to Visualize the SQL Group By Command

So to summarize this tutorial, I’d like to leave you with the trick I use to help me understand what’s going on when grouping data. Whenever you have a column listed in the group by area, just think of having buckets for that particular group of data. You’ll have a new “bucket” for each unique piece of data inside of the column you’re grouping on.

So if we’re grouping by bank_account_id, then I would picture three different buckets (since we have three different bank accounts). Then for every transaction that belongs to the first bank account, we’ll “throw” that transaction into the first bucket. For every transaction that belongs to the second bank account, we’ll “throw” it into the second bucket, etc.

Once we’re done putting our transactions into metaphorical buckets, we look at all the transactions in each bucket and perform any aggregate functions on each… and voila! That’s really all there is to grouping. For the sake of completion, I’d like to note that grouping can get more complex when grouping by multiple columns, so for example group by b.bank_account_number, t.transaction_date. But to figure this out, I’d just picture a bucket inside of a bucket. So first you determine which bank account a transaction belongs to and place it over the appropriate bank account bucket, but then look at the transaction date for it and place it into the appropriate transaction_date bucket that’s within the back_account_number bucket.


SQL Aggregate Functions

June 5, 2014

Podcast: Play in new window | Download SQL Aggregate Functions In today’s podcast episode (which can be played by clicking the PLAY button above) you’ll be learning all about the aggregate functions that exist in SQL. What the heck is an aggregate function? Well that’s what I’m going to try and teach you today, and […]

Read the full article →

SQL Join

May 28, 2014

Podcast: Play in new window | Download The SQL Join There are three categories of joins that you can make use of in SQL: Inner Join Outer Join Cross Join But before we dive into the categories of joins, we first need to have an understanding of what a join really is. Joins are used […]

Read the full article →

SQL Tutorial

May 23, 2014

Podcast: Play in new window | Download Learn SQL – Enforce Relationships Part II Be sure to listen to the embedded podcast episode that goes along with this blog post. You can click the PLAY button on the player above. In this SQL tutorial episode/post we’re going to learn how to enforce our SQL relationships […]

Read the full article →

How to Create a Table in Mysql

May 13, 2014

Podcast: Play in new window | Download SQL Queries You’ve learned all about how to create sql queries to read, write, update and delete data… but you haven’t yet learned how to create the tables where you’ll be doing the reading, writing, updating and deleting. So that’s what today’s podcast is all about, be sure […]

Read the full article →

Database Relationships – Many to Many / One to One

January 7, 2014

Podcast: Play in new window | Download In a previous post we learned a whole bunch about the most common database relationship that exists, the one-to-many relationship. In this post we will be expanding on the topic of database relationships and touch on two that are less common but just as useful. Many-to-Many Relationship The […]

Read the full article →