How HashSet works in java

In this post, we will see about Hashset in java

Java HashSet:

Part-1:HashSet in java
Part-2:Difference between HashMap and HashSet 
Part-3:hashcode and equals method in java


This is one of the frequently asked question in core java interview so in this post, we will see how HashSet works in java.We have already seen How hashMap works in java and also difference between HashMap and HashSet.


Lets first see introduction of Hashset then we will go through internals of it.

HashSet:

HashSet implements Set interface which does not allow duplicate value.It is not synchronized and is not thread safe.
Definition of duplicate can be quite tricky sometimes.Lets consider two cases here.
  1. In case of primitive types(such as interger, String)
  2. In case of custom defined objects.
In case of primitive types:
In case of primitives type, it is very straight forward.Lets see with help of example:
Lets create a java program:
package org.arpit.java2blog;
import java.util.HashSet;

public class HashSetMain {

 public static void main(String[] args) {
  HashSet<String> nameSet=new HashSet<String>();
  nameSet.add("Arpit");
  nameSet.add("Arpit");
  nameSet.add("john");
  System.out.println("size of nameSet="+nameSet.size());
  System.out.println(nameSet);
 }

}
When you run above program, you will get following output:
size of nameSet=2
[Arpit, john]
So we tried to add String "Arpit" twice, but as HashSet does not allow duplicate value, it will add "Arpit" once in HashSet

In case of Custom Objects:
For understanding how HashSet will work in case of custom objects, you need to understand hashcode and equals method in java.Lets create a class called Country and implement only equals method in it.
package org.arpit.java2blog;

public class Country {

 String name;
 long population;
 public String getName() {
  return name;
  }
 public void setName(String name){
  this.name = name;
 }
 public long getPopulation() {
  return population;
 }
 public void setPopulation(long population) {
  this.population = population;
 }
 
 public String toString()
 {
  return name;
 }
 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  Country other = (Country) obj;
  if (name == null) {
   if (other.name != null)
    return false;
  } else if (!name.equals(other.name))
   return false;
  return true;
 }
 
}
create main class:
package org.arpit.java2blog;

import java.util.HashSet;

public class HashSetCountryMain {

 public static void main(String[] args)
 {
  HashSet<Country> countrySet=new HashSet<Country>();
  Country india1=new Country();
  india1.setName("India");
 
  Country india2=new Country();
  india2.setName("India");
  
  countrySet.add(india1);
  countrySet.add(india2);
   
  System.out.println("size of nameSet="+countrySet.size());
  System.out.println(countrySet);
  
 }
}

When you run above program, you will get following output:
size of nameSet=2
[India, India]

Now you must be wondering even through two objects are equal why HashSet contains two values instead of one.This is because First HashSet calculates hashcode for that key object, if hashcodes are same then only it checks for equals method and because hashcode for above two country objects uses default hashcode method,Both will have different memory address hence different hashcode.
Now lets add hashcode method in above Country class
@Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((name == null) ? 0 : name.hashCode());
  return result;
 }
Run above main program again, you will get following output:
size of nameSet=1
[India]

So now we have good understanding of HashSet, lets see its internal representation:

Internal working of HashSet:

When you add any duplicate element to HashSet, add() method returns false and do not add duplicate element to HashSet.
How add method return false? For this, we need to see HashSet's add method in JavaAPI
public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    
    private transient HashMap<E,Object> map;
 
    // PRESENT is dummy value which will be used as value in map
    private static final Object PRESENT = new Object();
    
    /**
     * Constructs a empty map.so hash
     * 
     */
    public HashSet() {
     map = new HashMap<E,Object>();
    }
    
    // return false if e is already present in HashSet
    public boolean add(E e) {
     return map.put(e, PRESENT)==null;
    }
    
    // other HashSet methods
}
So from above code, It is clear that HashSet uses HashMap for checking duplicate elements.As we know that in HashMap , key should be unique. So HashSet uses this concept, When element is added to HashSet, it is added to internal HashMap as Key.This HashMap required some value so a dummy Object(PRESENT) is used as value in this HashMap.
PRESENT is dummy value which is used value for internal map.
Lets see add method:
 // return false if e is already present in HashSet
    public boolean add(E e) {
     return map.put(e, PRESENT)==null;
    }
So here there will be two cases
  • map.put(e,PRESENT) will return null, if element not present in that map. So map.put(e, PRESENT) == null will return true ,hence add method will return true and element will be added in HashSet.
  • map.put(e,PRESENT) will return old value ,if element is already present in that map. So  map.put(e, PRESENT) == null will return false, hence add method will return false and element will not be added in HashSet.

Please go through  core java interview questions for beginners for more interview questions.

Written by Arpit:

If you have read the post and liked it. Please connect with me on Facebook | Twitter | Google Plus

 

Java tutorial for beginners Copyright © 2012