The Continuing Quest for DEATH of string-literals in my Code…

Warning: this post won’t entirely make sense unless you first read the two other posts whose links are provided in the body of this one.

As mentioned in detail in the T4 Templatng Engine to the Rescue post, I am not a great big fan of literal strings littered all over my code (data-access or otherwise).  If we as developers can agree (for all kinds of good reasons I won’t bother to repeat here) that literal SQL strings are generally BAD, then I think we have to agree that literal HQL (hibernate-query-language) strings are generally BAD as well (for all the same reasons).  Its not the language that is good or bad, its the literal string representation of what the language is stating that make it brittle, resistant to refactoring, and impervious to compile-time validity-checking that is one of the cornerstones of why we tend to select a strongly-typed language in the first place.

Since the beginning of my seemingly unending interest in making data-access with NHibernate simpler, more reliable, and easier for the end-developer to not worry about smile_tongue, I have pursued various different strategies to weed out as much dependence on string-literal representation of values as I can for all those reasons.  IMHO, the ‘default’ implementation pattern of NHibernate’s dependence on string-literals in its query code (both HQL and the Criteria API) is one of its least-attractive aspects.

Different Ways to Address the Challenges

In the Four Stages of Object/Relational Mapping post, I illustrated in some detail the journey that I personally undertook (and consider to be very important for most others to also travel down in order that the eventual destination actually make some sense to the reader).  At each stage in this journey, I thought hard about how best to address ‘the evils of string-literals’ in my selected approach and in each stage I had a pretty well-working (for me!) implementation of a strategy to address the problem.

With the Code-Generation-based approach (Stage II from the Four Stages), I had adjusted the code-gen template to provide for strongly-type property-accessors that make it pretty trivial to write code like…

Employee.Properties.Firstname

…that would return the string “Firstname” as needed for your use in query construction, etc.

When I later progressed on the journey to Stage IV, readers will recall that I abandoned the code-generation approach in favor of constructing my objects ‘by hand’ so as to have much greater control over their shape, form, and relationships but in the process I also lost one of the side-benefits of the code-generator: the support for generating the strongly-typed instances of the strings that something like NHibernate requires in order to perform its queries etc.  As a point of order here, readers should note that the value of these strongly-typed instances of string representations actually goes entirely beyond NHibernate and is incredibly valuable also for things like strongly-typed databinding statements, etc., but the original impetus for it (in my mind at least) was to better support query construction in NHibernate.

As mentioned in my prior post, I came upon the T4 templating engine in a number of blog posts and decided that it could provide me a way to both have my cake and eat it too.  As mentioned in the T4 Templatng Engine to the Rescue post, the approach illustrated there would allow me to both hand-develop my object model and yet still get the benefit of the code that provided me strongly-typed access to the strings that correlate to the properties of my objects when I need them (for query statements, etc.).

The Trouble with the T4 Approach

But there were problems with that approach as well…

  • your classes had to be declared partial in order for the generated code to properly merge into them at compile-time
  • all of your classes that had this template run against them had to be in the same VS project (and all classes in that VS project got the template run against them whether you wanted it or not); this essentially meant that every class in any template-targeted VS project had to be declared partial
  • you had to setup the T4 toolkit dependencies properly on your system (as the methods in my template had dependencies on this other OSS template add-on for the basic T4 libraries)
  • actually iterating through the code files in any template-targeted project required a somewhat awkward (an d brittle) method of ‘reflecting’ against the VS IDE codedom model in order to discover what classes needed to have these extensions added to them
  • the approach turned out to be very brittle and susceptible to breaking bugs when new Visual Studio releases were deployed (e.g., VS 2008 SP1 actually broke my template that worked just fine under VS 2008 non-SP1 — somehow, I don’t really care about the specifics)
  • it turns out that unlike I originally expected (hoped!) that theT4 template would automatically run as part of the VS build process every time I would compile (thus ensuring that the T4-generated content always remained ‘in-sync’ with the classes the templates were extending), instead I discovered that the T4 template apparently needs to be manually ‘executed’ each time you want the template re-run (e.g., its not automatically invoked each time you build your project, but has to be manually invoked by the developer each time they KNOW — or are able to remember — that they changed any of the other classes in the template-targeted project)

In short, this approach was actually a gross violation of the principle of Separation of Concerns (SoC) and I was asking my objects to do more than I probably should have: both represent the domain model for my application and also provide strongly-typed strings for its members for consumption in the rest of my application.

So I again began a quest for a solution to my problem that would be a bit less-brittle, a bit less-invasive, a bit more automatic, and a bit more flexible.

The Answer Has Been in Front of Me All Along

It eventually dawned on me that there are several OSS projects that also suffer from the same ‘string-literal-based-dependency’ problem but that have already solved it using several clever C# 3.0 techniques.  My favorite mock objects framework,  Rhino Mocks, for example allows me to specify the name of a method using the new C# 3.0 lambda syntax like….

<...(x => x.Print())

…and there is even a whole OSS project called Fluent NHibernate that uses similar techniques to (attempt) to completely eliminate any need whatsoever for any XML mapping files to define your object-to-database mappings for NHibernate.  I’m not anywhere near ready to sacrifice the flexibility that XML mapping files provide me in defining my mappings by adopting the Fluent NHibernate approach for mappings, but the project presently does show tremendous potential and I’m watching it closely to see where it goes.

It came to me one day while walking my dog (yes, again, another very good reason to own a dog as I solve a number of thorny software design issues while wandering aimlessly through the park waiting for the dog to do his business and I’m alone with my thoughts and minimal distractions smile_teeth) that I could leverage the same kind of approach as in these two (and more) OSS project to provide access to the properties of my classes with a little bit of the same C# 3.0 lambda expression goodness.  So I came up with (as a prototype) the following syntax…

CriteriaHelper<Employee>.Properties(x => x.Firstname)

…which also returns the same “Firstname” string that I’m looking for in my code.  The CriteriaHelper<T> class is a static helper class (so I don’t need to instantiate one to use it’s methods) that uses the T argument to tell it what class to evaluate in the subsequent lambda expression to return the string representation of the property that is referenced.  For access to the actual name of the class itself in code (also useful both in query statements as well as databinding statements), the prototype syntax works out even simpler and doesn’t need an actual lambda expression as in…

CriteriaHelper<Employee>.ClassName()

…which returns the string “Employee” in the code as needed.

Reflection Is NOT (always!) Evil

As a .NET developer, there is (seemingly) one universal truth that everyone seems to agree on: Reflection is universally BAD.

Nobody ever seems to (completely) be able to tell me where this conception came from, but nearly all .NET developers have it.  To the point (for example) where entire frameworks that do even a little bit of work via reflection are dismissed by many out of hand as “well, it uses reflection in there so it must be incredibly slow to use“.  This is a recurring refrain that I have heard over and over again in the context of Rocky Lhotka’s CSLA framework (“it can’t be good, he’s using reflection in there“).  I have other reasons why I don’t prefer to use Rocky’s business object framework, but the fact that it leverages reflection in certain places just isn’t one of them.  But this anti-reflection perspective seems to be deeply ingrained in the DNA of every .NET developer.

Its certainly true that reflection is slower than accomplishing the same thing NOT using reflection, but some things just need to be done via reflection.  That’s why the Reflection API is there in .NET in the first place (and I’m glad its in there!).  And while reflection may be comparatively slow, in absolute performance numbers its (generally) a trivial component of an overall real-world performance benchmark.  Which is to say that in any system doing any real work, its highly unlikely that reflection is going to be your actual bottleneck.  Its much more likely that RPC communication, a web-service, a database query, or other behavior is really your limiting factor in re: performance of your application.

And performance optimization is the art of speeding up your current ‘slowest element’ only to next be faced with the need to optimize the next ‘slowest element’.  If you can get to the point in your application where the overhead of the .NET reflection API is your limiting performance bottleneck, then congratulations — you have a very speedy application indeed!  Sure you can demonstrate that creating 1 million objects in a loop via reflection is slower than just new-ing them up in the same loop, but very few real-world applications have a need to do that.

All of this is important because at least part of the ‘secret sauce’ that makes my CriteriaHelper<T> class work is indeed reflection under-the-hood.  But that’s a trivially small price to pay for the flexibility this gains me in being able to get compile-time validation on my query syntax.  And after all, in the context of running a database query, my slowest part of that isn’t going to be the small amount of reflection that occurs to construct the query objects that will eventually become SQL that’s sent to the database smile_wink.

A Little Polishing, then Ready

I have a bit more polishing to do on the thing to get it from rough prototype to useable library, but the proof-of-concept is done and definitely worth the effort (for me).  For example, given that all it does is provide strongly-typed access to the names of properties, methods, etc. on a class, I’m thinking that the name CriteriaHelper<T> isn’t really the most expressive of what its providing (as you can use it in other cases in code having nothing to do with Critieria-writing) so there may be a name-change coming when I sit to really consider the details of this.

Viewers of the Autumn of Agile screencasts will be seeing this approach in action as the project progresses over the coming weeks and the binaries for this will be similarly included in the code downloads for the screencasts should anyone be interested in using the same approach in their own work.

I would be interested in feedback from others re: opinions of the validity of this approach to solving the problem; thanks in advance for any comments~!