Inside the Sausage Factory: PART 20 (Detour to fix NDbUnit Foreign Keys)

In the last post, we completed our data-dependent unit tests for the Data Access Layer of our application using the NDbUnit software to assist us in ensuring our database is in a known-good state before our tests begin so we can accurately evaluate the results of our tests.

In an earlier post, I alluded to the the fact that there is a bug (or two) in the functional behavior of NDbUnit that prevents it from correctly performing several of its functions when there are foreign-key constraints in the database and that within my company we had developed a set of add-on helper utilities to address some of these shortcomings.  I also mentioned that I had always planned to go back and try to ‘fix’ the problems with the NDbUnit code itself but could never quite find the time.

As I re-read that post I began to think: how do I really know how much effort would be required to fix this unless I at least take an hour or two to determine exactly what’s wrong?  I might find that it can be fixed with a simple addition of an extra if (..) statement for example.  So I decided to take a detour to find out.

The Problem with NDbUnit and Foreign Keys

Before we go too much further, its probably worth it to just summarize the problem we are trying to solve.  NDbUnit (the latest v1.2) exposes several methods you can call against your database that are listed here as they come from the NDbUnit source code along with their comments (sort of) explaining their use:

    /// <summary>
    ///    The database operation to perform.
    /// </summary>
    public enum DbOperationFlag
    {
        /// <summary>No operation.</summary>
        None, 
        /// <summary>Insert rows into a set of database tables.</summary>
        Insert, 
        /// <summary>Insert rows into a set of database tables.  Allow identity 
        /// inserts to occur.</summary>
        InsertIdentity, 
        /// <summary>Delete rows from a set of database tables.</summary>
        Delete, 
        /// <summary>Delete all rows from a set of database tables.</summary>
        DeleteAll, 
        /// <summary>Update rows in a set of database tables.</summary>
        Update, 
        /// <summary>Refresh rows in a set of database tables.  Rows that exist 
        /// in the database are updated.  Rows that don't exist are inserted.</summary>
        Refresh,
        /// <summary>Composite operation of DeleteAll and Insert.</summary>
        CleanInsert,    
        /// <summary>Composite operation of DeleteAll and InsertIdentity.</summary>
        CleanInsertIdentity
    }

If you look at this list, just about all of them imply that NDbUnit will be executing one or more INSERT and/or DELETE statements against your database and just about all of them also use the word tables (plural) to describe the scope of their actions.  Obviously, anything that is going to perform any insert or delete actions against a database with foreign-key relations between tables needs to do so in a particular order that respects (and takes into account) the foreign-key relations so as not to violate any foreign-key constraints during the operations.

As a simple example, let’s consider two tables, EMPLOYEE and EMPLOYEETYPE where a foreign-key relationship is established between the tables so that EMPLOYEETYPE acts as a ‘lookup’ list to be used to categorize EMPLOYEE records:

image

If a record in the EMPLOYEE table references a record in the EMPLOYEETYPE table, then any attempt to DELETE that record from the EMPLOYEETYPE table will fail of course because its referenced in the EMPLOYEE table; the database won’t allow such a delete because it would violate the referential integrity of the data.  The same is true of attempting an INSERT operation to get data into these two tables — a record cannot be inserted into the EMPLOYEE table if it would refer to a record in the EMPLOYEETYPE table that isn’t inserted yet.

All of this leads us to one rather significant conclusion about the required behavior of NDbUnit: the order of INSERTS and DELETES needs to be such that FK constraints are not violated.  Unfortunately, as stepping through the NDbUnit code in the debugger and inspecting its behavior has shown, NDbUnit apparently tries to perform INSERTs and DELETEs on one ore more tables in alphabetical order based on the names of the tables instead of respecting FK relationships.

I’m not as unique as I think I am smile_embaressed

Not surprisingly, I am not the only one in the great big world who has noticed this shortcoming in NDbUnit — anyone trying to use NDbUnit against a database with foreign keys between tables would likely have this problem brought to their attention as soon as they tried any DB operation that tries to delete (or insert) data from more than one table (or, frankly, even from a single table under certain conditions).

In fact, the NDbUnit forum shows a post from someone in mid-2006 who has noticed this limitation in the context of the NDbUnit REFRESH database operation.  This same post has a response from someone else positing a solution in the form of a class he has written, DataSetTableIterator, that is designed to inspect a dataset, interrogate it to determine its FK relations, and build an ordered list of tables in a sequence that respects their FK relations.  While a great start, this approach also has a problem that needs to be overcome before it really solves the problem with NDbUnit.

Forward and Reverse Iterators

The proffered DataSetTableIterator class from this post uses a mostly brute-force approach to puzzling out the ‘correct’ way to order tables so that their FK relations are respected.  Basically in a very broad sense it does the following set of steps:

  1. loop through each table in the dataset
  2. if the current table has no foreign keys, just add it to the list
  3. if the current table does have a foreign key, check to see if the table referred to by that FK is already in the list
  4. if yes, add the current table
  5. if not, process the table pointed to by the FK relation using a recursive call to the same main processing function
  6. keep looping until all input tables from the dataset end up in the result list

In this manner, the tables are added ‘properly’ to the list of tables, each one correctly coming in sequence before the tables upon which it foreign keys are dependent.

If we analyze and list the tables that are in our database (and by extension the dataset that was code-generated from it as mentioned in this post) in the order that they can have data safely deleted from them, we get something like this…

  1. PersonSkill
  2. Skill
  3. SkillType
  4. SkillLevel
  5. Person
  6. Office
  7. Region
  8. PersonType
  9. PersonPhoto
  10. BusinessUnit

If we think this through logically, this list (and the order of it) makes perfect sense.  Let’s return to the Visual Studio representation of our Dataset for a moment:

_temp

As we can see, to safely delete the contents of the [Skill] table, we need to first delete the  contents of the [PersonSkill] table since [PersonSkill] references values in [Skill].  And to delete the contents of the [SKillType] table, we need to delete the contents of the [SKill] table.  So the start of the sequence…

  1. PersonSkill
  2. Skill
  3. SkillType

…makes logical sense.  Following the rest of the table list in order makes similar sense and as we can see from the diagram the last tables in the sequence (Region, PersonType, etc.) are those tables with no relations to any other tables at all.  So this is definitely the sequence that we need to ensure that we are interacting with our tables in an FK-safe manner.  Unfortunately, its not the list that is returned by the DataSetTableIterator class when used as-is from the forum post in which I found it smile_sad

Instead, this is the list returned from the DataSetTableIterator class when leveraged in a foreach(…) loop as recommended in the post:

  1. BusinessUnit
  2. PersonPhoto
  3. PersonType
  4. Region
  5. Office
  6. Person
  7. SkillLevel
  8. SkillType
  9. Skill
  10. PersonSkill

Even the casual reader will notice at once that this is not the same as the sequence we actually need; the astute reader will notice however that this is actually exactly the reverse of the ordering that we need for our tables!  The list is right, but is the reverse of what we actually need.

So now the challenge is how to iterate through a collection backwards in the context of a foreach(…) loop.  Fortunately its relatively trivial to adapt the DataSetTableIterator class to offer the option of functioning as either a forward-iterator or a reverse-iterator on an as-needed basis.

Conceptually this is quite simple: we provide an override to the constructor signature for the DataSetTableIterator so that the caller has the chance to specify that they are interested in a reverse-iterator as so…

    public DataSetTableIterator(DataSet dataSet, bool iterateInReverse)

…and then after we are done with the processing logic that builds the list of tables and orders them in forward order, we simple reverse the order of the underlying list right before we return from the constructor.  The final result from such a call to the DataSetTableIterator is (now) in the order desired when called through a foreach(…) loop:

  1. PersonSkill
  2. Skill
  3. SkillType
  4. SkillLevel
  5. Person
  6. Office
  7. Region
  8. PersonType
  9. PersonPhoto
  10. BusinessUnit

All we need to do now is replace all the calls in the NDbUnit source code that simply iterate through tables in alphabetical order in v1.2 to instead use this approach in either a forward or reverse direction as appropriate.

Perhaps we should call this NDbUnit 1.3 (1.2.1?).  If there is anyone interested in this fixed version of NDbUnit that correctly respects FK relations in datasets, please just drop me an e-mail and I can send you the updated NDbUnit source code with both the ‘corrected’ DataSetTableIterator class and the changes to the original NDbUnit 1.2 code.  Eventually I may get around to posting this set of recommendations and updates on the NDbUnit forum.

Update our TestFixtureSetUp Method

With that set of bugs in NDbUnit ‘corrected’ we can now do two things:

  1. the first is to refactor our existing NDbUnitUtility class so that it no longer exposes the DeleteDataFromTables( ) method and instead can perform some other useful functions like storing the connection string, the path to the XML dataset file, and the path to the XSD schema file (and also testing to ensure they exist)
  2. the second thing we can do is modify our TestFixtureSetUp method in our data access layer test class to both work with the newly-redesigned NDbUnitUtility class and also to rely solely on the CleanInsertIdentity( ) method to successfully get the database into the pre-testing state for us as in…
        [TestFixtureSetUp]
        public void FixtureSetUp()
        {
            NDbUnitUtility utility = new NDbUnitUtility(CONN, SCHEMA, DATA, NDbUnitUtility.DatabaseClientType.SqlClient);

            _sql = new NDbUnit.Core.SqlClient.SqlDbUnitTest(utility.ConnectionString);
            _sql.ReadXmlSchema(utility.SchemaFilePathName);
            _sql.ReadXml(utility.DatasetFilePathName);

            //completely zeros-out the existing database and loads it with the XML version of the data
            _sql.PerformDbOperation(NDbUnit.Core.DbOperationFlag.CleanInsertIdentity);
        }

This approach is much more succinct than the prior method of relying on a separate utility class to ‘correct’ for the shortcomings of the NDbUnit 1.2 release.

With that detour completed, we can return next to continuing to develop our original project~!