Thursday, December 2, 2010

That pesky Hibernate

Yesterday a collegue of mine asked if I could build some functionality to physically delete records in our database.
Since deleting things is always a good idea, I immediately started to work on his request.
As we are using an ORM (Hibernate) to manage our relational persistence, this would just be a matter of adding a single line of code:


And all the Hibernate magic combined with some cascading would take care of the deletion.
So I concluded my assignment fast, and got the well know warm cosy feeling: 'job wel done'.
However, suddenly I realised that the combinaton: "hibernate, magic, job well done, warm cosy feeling" has put me into problems before.
In fact, I remeber this since last time I was in that position I ended up feeling exactly like this man did.

Just to be sure I went back to the code and configured my logging to print out what Hibernate was doing.
As it turned out, something bad was going on. It seems that also Hibernate has difficulties deleting relationships (thats a line to think about).

The problem is that each child is deleted one by one.
Lets say you have two entities: Person and Product. There is a one to many relationship between Person and Product.
The relationship is uni-directional from Person to Product, mapped as a Set, managed by Person (as there is only one side) and the cascading is set to all-delete-orphan.

<hibernate-mapping default-access="field" >
   <class name="entities.Person" table="person">
      <id name="id" column="id" access="property">
         <generator class="native"/>

      <set name="products" cascade="all-delete-orphan">
         <key column="person_id" not-null="true" update="false" foreign-key="person_fk"/>
         <one-to-many class="entities.Product"/>

<hibernate-mapping default-access="field" >
   <class name="entities.Product" table="product">
      <id name="id" column="id" access="property">
         <generator class="native"/>
      <property name="name" column="name"/>

If you delete a Person object which has 5 products, you will see that pesky Hibernate doing this:

31398 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31399 [main] DEBUG org.hibernate.SQL  - delete from product where id=?
31400 [main] DEBUG org.hibernate.SQL  - delete from person where id=?

In some conditions this might be acceptable, however, in our case there could be thousands of childs records.
I do know that with bulk operations and large datasets Hibernate is maybe not the best choice.
However, we configured Hibernate that it just worked for our case.

For example; in our real scenario the "lazy" attribute on the Set has been set to "Extra"
This allows to get a count on the child Collection without Hibernate loading every child.
Other bulk operations we managed by doing HQL. So in the end it worked out very good (and fast) so we where happy with the ORM functionality without having to pay performance costs.
So I was not planning to let this problem become a performance bottleneck.

As it turns out this 'problem' is very known one and there is lots to be found on the internet. I learned that Hibernate should normally be able to do a 'single shot delete'.
It is even mentioned in the manual! So I tried every possible combination, but could not force hibernate to do that 'single shot delete'.

Secondly I tried to do it the 'bulk' way by just issuing some delete HQL.
In our case its not a problem that it bypasses the Session, since the transaction consists only out of the delete.

YAP! As it turned out, in my real scenario the Child has a Hibernate <map> relation to a properties table.
When doing an HQL delete, the cascading on the <map> is not respected. This seems to be a issue as well (here).
I also tried to delete the <map> manually using HQL, but that doesn't work since there is no entity to 'grab' in the query.

Thirdly, I learned about the "on-delete" attribute which you can set on the <key> element in the <Set>.
The only requirement is that the Set must be the non-managed side of the relation (so inverse must be 'true').
In my Person/Product example this is not the case (as there is only one side).
But in my real scenario, which is slightly different, this is the case: the association is mapped bi-directionally and the Child is owner of the relationship.

If you enable the on-delete = "cascade" Hibernate would leave the deletion to the database.
In other words, it will depend on the database cascading delete, so your table DDL should have the 'on delete cascade' in its FK constraint DDL.
Cool note, that if you use hbm2ddl Hibernate generates the correct DDL including the on delete cascade.

Good, good! The only thing I now had to do was to tell our db administrator to add the 'on delete cascade'.

But wait, I also still have the <map>, So I tried to add the 'on-delete= cascade' there as well.
BUT! on-delete = 'cascade' can only be added on inverse associations!
The <map> is mapped uni-directional and is therefore always the owner. So, bad luck once again, this did not work.

Agreed: at this point I could refactor the <map> construct to a normal stand alone Entity and make a bidrectional one-to-many/many-to-one.
But that would imply many hours refactoring for something that was just working fine, so I felt not doing this.

Fourthly (and finally) I decided to just bypass Hibernate completely for this delete.
I created a named query :

<sql-query name="deletePerson">delete from person where id = :personId</sql-query>

I asked the database administrators to add 'on delete cascade' to the FK's of Product and the map table.
To be able to test this in our in-memory database, I added some extra DDL in the Person mapping:

  alter table product drop constraint person_fk
  alter table product add constraint person_fk foreign key (person_id) references person(person_id) on delete cascade
  <!-- same here for the <map> relation-->

This allows to simulate the database cascading in my tests (which run against an in-memory database).
So to conclude my journey, the steps you can try if you want a fast delete:

1. Do not use Hibernate
2. Just use the not working one-shot-delete, so you don't have to complain about Hibernate in some blog entry

Now as for your real options:

1. Map your 'one' side of the association with on-delete="cascade" and alter the FK constraints in the database with cascading
2. Use HQL to delete the entities yourself (do not forget this is a bulk operation and the delete is not reflected in the Session!)
3. Bypass Hibernate and use plain SQL + database cascading (same remark about the session)

Oh, btw, if you use the first suggestion, be aware that (even if you are using lazy true/extra on the association) Hibernate always loads the complete association before issuing the delete. Ain't that cute or what?
So if you have large datasets going on and you want performance, afaik you are limited to 2 and 3.

Saturday, November 20, 2010


Java Puzzlers, the last presentation thursday at Devoxx, came with some new, or at least some refurbished, puzzlers to melt our minds.
One however did melt something away in my mind. It had to do with object immutability.

The puzzler goes like this:

Supose you have map, lets say an EnumMap or ConcurrentHashMap.

You put some values in that map, and next you obtain the entry set.
Now, instead of just doing something with the entries, you add this set to another Collection of type Set using the addAll() method (or constructor, which in fact calls addAll for you).

In code this might look like this:

Map<String, String> map = new ConcurrentHashMap<String, String>(); 
map.put("one", "two"); 
map.put("two", "one"); 
Set<Entry<String, String>> set = new HashSet<Entry<String, String>>(map.entrySet()); 

Note: that the actual presented puzzler was in a more entertaining form

You will notice that the set.size() will return 1, and not 2.
The reason behind is: lack of clarity in the API and an attempted optimization by the implementation.

The API does not say anything specific about the mutability of the Entry objects returned in the Set.
Some Map implemenations made the choice of a 'singleton' Entry object instead of creating new immutable Entry objects for each value in the map.

What actually happens is that they change the content of the one and only created Entry for each map value, by calling setters on that sole Entry object.

So if a map has 1000 elements, only one Entry object will exist.
When the map is dumped to a set of Entries, the map will have called 1000x setKey and setValue on that single Entry object for each element in the map.

Funny side effect that if you change the values in the map to something like this:

map.put("keyOne", "valueOne"); 
map.put("keyTwo", "valueTwo");

The set.size() DOES print 2 :-)
What happens is that the map implementaton uses this construct to detect if an entry already exists:

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

So, the hashCode must be the same AND they must be equal, either through reference or object equality.

In the first case, the hashes were equal and the reference of key was also the same.
The hashCode on an Entry object is calculated by taking the hashCode of the key exorred with the hashCode of the value.
In the first case the hashCode of 'one' exorred with the hashCode of  'two' is the same as the hashCode of 'two' exorred with the hashCode of  'one'.

Allthough they are distinct Entries (there will not be object equal) the hashCode is the same which is also allowed by the hashCode contract
Remember that two objects that are equal must produce the same hashCode, but two non equal objects are not required to produce different hashCodes

For the second case, the hash actually differs, so they don't bother looking further and the entry is considered as 'distinct'.
Of course if you would iterate the set, you would find that the key and value of the Entry object are twice the same.

Its unlikley that you run into this problem in your code base, but if you do, it will guarantee long cosy hours of debug.
The reasons why I find this particular 'bug' so interesting are its diverse lessons that we can learn from it:
1. Premature micro optimization comes at a cost. You could argue any optimization is good at this level . Maybe,  but I know for sure that for us as application developers we should not attempt such optimizations of our own. You risk creating pitfalls beyond your knowledge while (als in most cases) the gains by such optimizations are very hard to prove.

2. Immutability plays an important role in your code. It can avoid 'bugs' such as this one. It gives you a signature when handling an object that it is guaranteerd to be in the same state as it was created with.

3. Basic design and code rules apply for everyone. It also shows that it can be forgotten in any project. In this case both the API designer as well as the people who implemented it did not question this.