On Choosing an Identity (type!)

In this comment to one of my posts, Ryan asks…

Can you give any insight as to what life would be like, if I decided to migrate away from Guids as my IDs to Ints?

Sure: it would look exactly the same except your entities would derive from IntIdentityPersistenceBase instead of GuidIdentityPersistenceBase smile_teeth

Seriously though, this a great question and it goes to the heart of something that I may not have been completely clear on in the screencasts: why do I choose Guids for identity values over the probably more common Int32 data-type in the first place?

Followers of my Autumn of Agile screencast series will probably have taken note of the fact that I have chosen to use Guids as the data type for the value in my persistent objects that I am using to model my domain.  Although I had thought that I was clear on explaining the reasoning behind my choice, I went back and skimmed over the screencast wherein I showed this choice of mine and I have to say that my memory of how clear I was and how clear I actually was are apparently two somewhat different things smile_embaressed

Understanding Choices

With NHibernate, you have great freedom to choose whatever data type you want to use to represent your identity value in your objects.  This is one of the great strengths of NHibernate in re: its ability to adapt to the needs of your application rather than force you to do one thing or another in a specific way.  But one challenge with all this flexibility is to always try to avoid confusing ‘flexibility’ with ‘equality’: picking any one of the wide-array of choices about how to do something isn’t without its pros and cons — that’s partly why there are a somewhat dizzying array of choices about how to do things in NH in the first place~!

What’s Wrong With Ints as IDs?

The trouble with integers as Identity values is that they have to be assigned by the database.  Technically of course this isn’t entirely true — there is nothing that requires an integer as primary key to be assigned by the DB, but its certainly the 95% case (if not the 99% case!) that when you use Ints as identity values, you also tend to ask the DB (usually MSSQL in most cases) to auto-increment and assign the identity value for you.

NH actually supports either approach quite capably: your identity (no matter the data type chosen) can be an auto-increment value assigned by the DB or a value assigned by your application.  But if you choose to take the responsibility for assigning it within your application then it becomes really difficult to ensure properly unique identities in a multi-user environment.  Where anyone might be attempting to do INSERTS into the DB at the same time, all of these would need some way to ensure their copy of the application (or their session, if a web app) was assigning truly unique Int values for Primary Keys.  This issue (how to ‘coordinate’ unique Ids across multi-user applications) is, after all, why DBs have auto-increment PKs as an option in the first place.

Example

To better understand the issue, let’s consider the following code snippet (adapted — somewhat — from the project in the screencasts):

        using (UnitOfWork.Start())
        {
            Employee emp = new Employee();

            _employeeRepository.SaveOrUpdate(emp);

            if (CheckSomethingImportant() == true)
                _employeeRepository.DeleteEmployee(emp);

            UnitOfWork.Current.Flush();
        }

Now, this snippet is obviously contrived — its hard to imagine your needing to do exactly this sequence in code, but let’s pretend its real.  The pattern (at least) is completely real: do something to one or more objects, save them, delete one or more of them, add one or more new ones, and then Flush() the unit of work when you’re done.  This is the whole point of the Unit Of Work pattern in the first place: do anything I want to with any of the objects and then (if I’m completely happy with the results) commit the unit-of-work in one atomic unit.

What’s Wrong with our Code Snippet?

The $64,000 question (the 64kb question????) is: when in the above example does the first DB communication happen?

Well, what we know about the Unit-Of-Work pattern suggests that the semantically-correct answer is in the line…

            UnitOfWork.Current.Flush();

…since that’s the whole point of the Unit-Of-Work pattern: nothing actually happens until I commit the UoW.  And if you were using application-assigned identity values, you would be correct!  If the call to CheckSomethingImportant() returns false then emp will be saved to the DB when I call UnitOfWork.Current.Flush();.  And if the call to CheckSomethingImportant() returns true then actually absolutely nothing will happen when I call UnitOfWork.Current.Flush(); — the unit-of-work will be empty since I effectively undid the save by deleting the Employee from the unit-of-work before I commit it.

But if you actually step through this very same code in the debugger and watch SQL Profiler when using auto-increment identity values, you would see something a little unexpected (and disconcerting, at least to me): the emp instance of Employee would actually be saved to the DB in the line…

            _employeeRepository.SaveOrUpdate(emp);

before I commit the Unit-of-Work!

Why?

If you start to think for a moment about what component in your system is responsible for assigning an auto-increment identity value, then you realize why the call to

            _employeeRepository.SaveOrUpdate(emp);

results in an INSERT against the database.  Once you call this method, NHibernate needs to change your object from transient (not part of the unit-of-work) to persistable-but-not-persisted and to do this it needs to assign it a valid identity value.  And the only place to get this value is from the database (by calling INSERT to get the next auto-increment identity value and assign it to the object).

NHibernate needs this object to have a valid identity once you call

            _employeeRepository.SaveOrUpdate(emp);

so that any other object that you might attach to the session that needs to reference this object (like children in a collection that need to store the identity value of their parent, etc.).  Since the only way that NH can get an identity value for this object is to INSERT it into the DB and have the DB assign it a Primary Key, NH has to issue an INSERT outside of the actual Unit-Of-Work (e.g. before you commit the UoW) to get this value.

GUIDs can be Generated Without Talking to the Database

This simple fact: "GUIDs can be generated without talking to the database" is the core reasoning behind my selection of them as identity values in the screencasts and my preference for them in professional practice.  They allow my application (and NHibernate, to whom I will delegate the responsibility) to generate and assign Identity values without needing to communicate with the database to find out the next auto-increment identity value to assign to the object.

This makes for a cleaner, less-chatty, more predictable application whose Unit-Of-Work behaves as expected.  It also makes for an application that isn’t dependent on what is really a database-specific implementation detail to assign identity to my objects (hint: not every database platform supported by NH supports auto-increment PK values).

For Clarity

In the interests of completeness and 100% accuracy, this capability (to assign identity without asking the database for the next valid value) has absolutely nothing to do (inherently) with using Guids as a datatype.  You can accomplish this with any datatype that NH supports for an identity.  But since Guids are guaranteed unique as part of their definition, they make an excellent choice for an identity datatype since I can ensure they are unique and are assignable by my application without my needing to keep track of them somewhere in my running application.

I hope this helps clarify a bit better why I made the choice the way I did.  And thanks again for the great question~!