Your mission is to test a new data warehouse table before its ETL process is even set up. Here are some ideas:
Start with the table structure. You can do this before the table even has data.
- If you’ve got design specs, start with the basics;
- Do the expected columns exist?
- Are the column names correct?
- Are the data types correct?
- Is the null handling type correct?
- Do the columns logically fit the business needs? This was already discussed during design. Even if you attended the design, you may know more now. Look at each column again and visualize data values, asking yourself if they seem appropriate. You’ll need business domain knowledge for this.
- Build your own query that creates the exact same table and populates it with correct values. Don’t look at the data warehouse source queries! If you do, you may trick yourself into thinking the programmer must be correct.
Once you have data in your table you can really get going.
- Compare the record set your query produced with that of the data warehouse table. This is where 70% of your bugs are discovered.
- Are the row counts the same?
- Are the data values the same? This is your bread and butter test. This comparison should be done programmatically via an automated test so you can check millions of columns & rows (see my Automating Data Warehouse Tests post). Another option would be to use a diff tool like DiffMerge. A third option, just spot check each table manually.
- Are there any interesting columns? If so, examine them closely. This is where 20% of your bugs are hiding. This testing can not be automated. Look at each column and think about the variety of record scenarios that may occur in the source applications; ask yourself if the fields in the target make sense and support those scenarios.
- Columns that just display text strings like names are not all that interesting because they are difficult to screw up. Columns that are calculated are more interesting. Are the calculations correct?
- Did the design specify a data type change between the source and the target? Maybe an integer needed to be changed to a bit to simplify data…was it converted properly? Do the new values make sense to the business?
- How is corrupt source data handled? Does the source DB have orphaned records or referential integrity problems? Is this handled gracefully? Maybe the data warehouse needs to say “Not Found” for some values.
- Build a user report based on the new data warehouse table. Do you have a trusted production report from the transactional DB? Rebuild it using the data warehouse and run a diff between the two.
What am I missing?