T4 Templating Engine to the Rescue (sort of~!)

Its somewhat funny to me how the usual ‘knock’ against Open Source Software is the poor quality of the documentation that often accompanies it.  “Why can’t we have professional quality, comprehensive documentation for these OSS projects just like the really solid, professional quality documentation that comes from — oh, I don’t know — Microsoft for example?” is something that I hear frequently on message boards, forums, etc. from OSS adopters struggling to implement this, that, or the other OSS product into their projects.

I won’t bother to delve into the often-debated reasons for the lack of interest in producing documentation for OSS projects because that’s not what this post is about and others have already beaten that topic long-since to death.  I will however talk a bit here about the fact that I’m not really sure any longer that its the case that Microsoft actually provides good documentation for all of their commercial offerings.

The Naked Emperor

I think its time that we acknowledge that in a lot of ways, the Emperor has no clothes (at least is some cases).  To be fair, Microsoft has really excellent documentation for many of their products.  But its apparent to me that not all Microsoft products have the same requirement for comprehensive documentation.

I just spent perhaps the most frustrating 6-8 hours of my recent memory attempting to implement something in a technology that Microsoft has not only developed and officially released, but provided in a commercial offering: the Text Template Transformation Toolkit (T4) that shipped as an add-on to Visual Studio 2005 (via the DSL SDK add-in) and is ‘fully-integrated’ into Visual Studio 2008.

When it was released for VS 2005, I must have been one of the five people who downloaded it and tried to do something useful with it.  After about an hour of struggling with it and discovering the near-perfect lack of any resources (online or otherwise) for doing anything with it, I put it to the side and went back to my existing code-generating toolset (MyGeneration at the time).

But just the other day I noticed a blog post from Scott Hanselman where he pointed out that the same T4 engine is now built into Visual Studio 2008 and also offered several helpful resources for getting started with it.  I always felt awkward about the lack of simple integration that I could achieve between Visual Studio and the MyGeneration toolset and so I thought to myself — “Hey, if Scott thinks its a mature enough technology to bother blogging about it, I guess the least I can do is give it another look.

Now I will be the first to say that I’m not the best software developer I’ve ever met smile_embaressed.  To get really good at anything (and stay there) you have to pretty much spend your whole life doing just that.  And its been a (frighteningly large) number of years since my primary job responsibility was actually writing code for 8 or more hours a day.  But I have mastered many significantly different programming languages in my career ranging from C++ to LISP to VB to FORTRAN (and of course VB.NET and C#) and so I consider myself pretty adept at becoming at least capable in just about any new technology out there after not a huge amount of exposure to it.

But I have to say that my experience with the T4 engine have been truly confounding to me.  There is very little (if any) formal documentation out there on this technology and what few MSDN-sanctioned resources (screencasts, etc.) are out there are focused on “intro to T4” and all use the same two examples over and over again: either its a for-each loop in the body of your template that counts from 1 to 10 and spits out generated Console.WriteLine(…) calls or its the ‘reverse-engineer the database schema’ example where we auto-generate a DAL in three clicks.  Neither of there were even close to my use-case, so they really weren’t helpful.

My search for any other documentation on this technology proves that just because Microsoft can write comprehensive, professional, complete documentation for some of their products, certainly doesn’t mean that they actually will do so for all of them.  For the record, I will point out that in the end I managed to achieve what I was trying to accomplish with the T4 engine, but only after pulling my hair out over some of the quirks of this technology (and those of you who have met me know that I have very few extra hairs to spare for such things smile_teeth).

Parting is Such Sweet Sorrow

In this recent post of mine, I documented some of my professional journey that I have taken in search of ever-better (for me) data-access approaches.  At the stage I have now arrived at, code-generation is actually no longer part of my preferred methodology.  As I had mentioned, I prefer now to be in more fine-grained control of my code than any ‘reverse-engineer-the-database-schema-to-create-classes’ approach can afford me.  For the Domain Models that I want to build, slaving my class model to my database just doesn’t make sense to me for most of the projects I am faced with in my software engineering practice these days.

But when I left code-generated data-access-layers behind, I also had to forego a number of pleasant side-effects that code-generation offered me.  I’ve never been completely satisfied with that trade-off (I would really rather have my cake and eat it too if I could get away with that).  So over the course of the past many months since I gave up crack-like high that comes from auto-generated code, I have mulled over several different ways to get back what I was missing and so when Scott blogged about T4 it piqued my interest and I decided to try to use this technology to solve my problem (or bake my cake, as the case may be).

My Use Case: String-Literals Suck

I don’t think that there is a (professional) developer alive worth his or her salt that can honestly say “I love string-literals in my code”.  The so-called ‘magic-string’ problem results in code that is hard to maintain, resistant to refactoring using modern tools like Refactor Pro! (my favorite) and Resharper, and leads to difficult to track-down run-time bugs that never surface during compile-time and are even hard to catch with fairly high code-coverage unit tests.

In fact, the ‘fear’ (or at least loathing) of string-literals has actually spurred the creation or at least the redesign of entire sets of software whose primary or at least secondary purpose is to allow developers to avoid literal strings in their code.  Some obvious examples from the .NET OSS ecosystem are things like…

  • Fluent NHibernate
  • Rhino Mocks
  • StructureMap
  • Moq

As anyone who knows me knows quite well, I am an avid proponent of the use of NHibernate for O/RM services in my applications.  I love the power, the flexibility, the extensibility, and the comprehensive nature of the tool.  But I hate its dependence on string-literals in its querying infrastructure.  Although I love the power that HQL gives me, I’ve never been able to reconcile the following conflict in my mind: if we all agree that literal SQL strings in my code are bad, how can it be that we consider literal HQL strings in my code to be good?  I get that HQL is db-agnostic and object-based whereas SQL is platform-specific and set-based, but the downsides to maintaining HQL are the same as the downsides to maintaining literal SQL.

So instead, I tend to prefer the NHibernate Criteria API where I can get away with it in place of HQL (not every valid HQL statement can be expressed in the Criteria API).  This tends to reduce my need for string-literals in my code, but not completely eliminate it.

Some Examples

The easiest way to see the dilemma is by way of an example.  Here is a method from my Repository that returns Customer entities ordered by their Lastname property (as excerpted from a code sample from my recent Summer of NHibernate screencast series):

public IList<Customer> GetCustomersOrderedByLastnames()
{
    return _session.CreateQuery("select from Customer c order by c.Name.Lastname")
        .List<Customer>();
}

You can of course see the string-literals in there; since its HQL nearly the entire thing is string-literal.  If (for example) the Lastname property would be changed in my Name class, this query would cease to function properly.  If I’m doing my job right, then I will catch this issue during an integration test where I exercise my DAL against a real database, but who wants to run 100 unit tests to find 73 of them failing just because I changed the name of a property on a class?  Even after my unit (integration) tests find this problem, I still have to go fix it in the 73 places its now wrong.

This is the same query expressed via the Criteria API:

public IList<Customer> CRIT_GetCustomersOrderedByLastnames()
{
    return _session.CreateCriteria(typeof(Customer))
        .AddOrder(new NHibernate.Criterion.Order("Name.Lastname"), true)
        .List<Customer>();
}

Better, but still not great.  I have reduced the instance of literal strings in my code, but the darned class name and property name are still in there as string-literals.  And let’s face it, these are actually the things in either query syntax that are most likely to change so there really isn’t much improvement in the Criteria API version in re: protecting me during a refactoring exercise.

The Ideal Answer

What I’d really desperately like to be able to do is something like the following:

public IList<Customer> CRIT_GetCustomersOrderedByLastnames()
{
    return _session.CreateCriteria(typeof(Customer))
        .AddOrder(new NHibernate.Criterion.Order
            (Customer.PropertyNames.Name.Dot(Name.PropertyNames.Lastname), true))
        .List<Customer>();
}

Note that in the above version of this query syntax there are no (zero, nada, zip, nothing) string-literals needed to express it.  I have achieved developer nirvana, code with no magic-strings in it that are resistant to refactoring and will stymie me (or some other sucker) during any subsequent maintenance phase of this project.

So How do We reach Nirvana?

So how does this happen?  Well, this question actually loops us all the way back to the original topic for this post, using the T4 template engine to get us some code-gen goodness where its most needed.  Back when I used to use code generation to create my entire class model for me, I used to have some output from the MyGeneration template I extended that looked something like this…

public static class ClassName
{
    public const string Customer = "Customer";
}
public static class PropertyNames
{
    public const string CustomerId = "CustomerId";
    public const string Name = "Name";
    public const string Orders = "Orders";
    public const string Version = "Version";
}
public static class FieldNames
{
    public const string _customerId = "_customerId";
    public const string _name = "_name";
    public const string _orders = "_orders";
    public const string _version = "_version";
}

Defined as nested types within the generated Customer class that was reverse-engineered from the db schema, these static nested classes provided strongly-typed access to the underlying string values that related to the Customer class.  When you wanted the string for the CustomerId property, you could simply reference it in code as Customer.PropertyNames.CustomerId and this would return the string-literal “CustomerId” that you really wanted to use.  Worked great, but when I abandoned code-generation as a methodology, I also lost the benefit of generated code like that above that helped save my data-access-layer from string-literal-dependency-hell (SLDH?).

T4 To the Rescue!

For some time now I have mulled over several strategies to get this capability back again in my arsenal.  One leading candidate was actually to build a plugin for CodeRush/Refactor Pro! using the DXCore framework but that approach always seemed to me to be a) too proprietary and b) too heavy-handed (e.g., a whole DLL just to do this code-gen?????).

Another idea was to leverage the VS macro infrastructure to code a VB.NET macro to do the work.  This too seemed annoyingly complex and pretty non-portable (e.g., I could just see one of our devs reporting “can’t refactor the class properties because I don’t have the macro installed on my VS instance”).

So in the end, the answer was to spend 6-8 hours fighting through the undocumented quirks of the T4 engine and syntax to get it to do what I needed.  The results look like the following….

public partial class Customer
{
    ///<summary>
    ///Provides a static string-literal for the name of the Customer class
    ///</summary>
    public static class ClassName
    {
        public const string Customer = "Customer";
    }

    ///<summary>
    ///Provides static string-literals for the properties of the Customer class
    ///</summary>
    public static class PropertyNames
    {
        public const string CustomerId = "CustomerId";
        public const string Name = "Name";
        public const string Orders = "Orders";
        public const string Version = "Version";
    }

    ///<summary>
    ///Provides static string-literals for the fields of the Customer class
    ///</summary>
    public static class FieldNames
    {
        public const string _customerId = "_customerId";
        public const string _name = "_name";
        public const string _orders = "_orders";
        public const string _version = "_version";
    }
}

This code-generated class output from T4 uses the partial keyword so that it’s content can fold into the hand-written Customer domain object that I would have written ‘manually’ (the old-fashioned way?).  Unlike the code-generation from the MyGeneration template that interrogates the database schema to ascertain what to generate, the T4 template interrogates my own class code to determine what to generate so there is no dependency on the database schema, just on my own hand-written class definitions.

What you Need

If you are interested in trying this out for yourself, here’s what you need…

  • Visual Studio 2005 with the optional DSL SDK installed (for the T4 engine)
  • Visual Studio 2008 (already has the T4 engine integrated)
  • The T4Toolbox ‘library’ of useful utilities (upon which my template depends so you have to install this first)
  • My template here

Install the DSL SDK if you have VS 2005 (no need if you have VS 2008).

Install the T4Toolbox from CodePlex.  I use several of the helper methods in that toolkit in my own template, so you have to install this first.

Copy the .tt template file from my download zip file into the directory for your project and add it to any existing project in VS (note that you may have to toggle ‘show all files’ in order for you to see the file to add it to the project).

You’re done.  When you right-click on the .tt file in solution explorer and select ‘run custom tool’, a file called ClassStaticStringExtensions.cs will be generated.  This file will contain generated partial classes for every class that is in the project you just added the template into.

Some Parting Gotchas and Other thoughts

Consider the following as you proceed:

  • I recommend you put this into the project that contains your DOMAIN MODEL classes since that’s more or less what it was intended for, but you can put the template anywhere you want to (in any project in your solution) and it will faithfully produce code-gen output for whatever classes are anywhere in the project you add it to

 

  • You could have copies of the template in more than one project in your solution if you needed to (nothing prevents this)

 

  • Don’t edit the generated .cs file; anything you do there will be lost as soon as the template is re-engaged.  If you find you need to further extend anything that is generated from the template, here’s a little trick: there is nothing that says that you cannot have more than two classes with the same name so long as they are all marked with the partial keyword so feel free to mark your domain classes partial, add any other additional class content you need in other partial class files, and the code-generated output can all live in peace and be assembled by the compiler into one giant class from all the partial bits that make up any single class

 

  • Note that the template will produce partial classes for every class in a single project whether or not the classes already in there are in fact partial.  This effectively means that every class in the project you place the template into must be declared partial.  If you forget this, you will definitely get a compiler error since you will need to go back to your ‘hand-written’ classes in the project and declare them partial.  Note that there is a slight overhead added to the compile process if you declare a class ‘needlessly partial’ but there is no run-time performance impact so I nearly always recommend that you write classes ‘partial by default’ in much the same way I recommend ‘virtual by default’ for methods and properties (even though ‘virtual’ has a run-time peformance penalty and ‘partial’ doesn’t)

 

  • By way of a gotcha, I noticed that the template actually affects NOT exactly the project its added to per se; it turns out that it actually runs against whatever project is currently ‘active’ in solution explorer when you run it.  This means that if you add the .tt file to Project1 and then click on something in Project2 in the solution tree and then right-click on the .tt file to say ‘run’, you will actually get output based on the classes in Project2 that was ‘active’ rather than Project1 where the .tt file was located.  In almost every case, this will result in a compile-time error for you since all partial ‘parts’ of a class must be in the same assembly in .NET (which means the same project unless you are using ILMerge techniques to post-process your assemblies) so if/when this happens to you, just click on something in the solution explorer that’s in the ‘right’ project and then execute the template.

As usual, I hope you find use in this template and of course let me know if anyone has any questions, comments, or other feedback.

UPDATE: see this post a discussion of (what I think) is a better way to achieve the same goal without involving the complexity of the T4 template engine or any code-gen at all: http://unhandled-exceptions.com/blog/index.php/2008/11/22/the-continuing-quest-for-death-of-string-literals-in-my-code/