My data warehouse project team is configuring one of our QA environments to be a dynamic read-only copy of production.  I’m salivating as I try to wrap my head around the testing possibilities.

We are taking about 10 transactional databases from one of our QA environments, and replacing them with 10 databases replicated from their production counterparts.  This means, when any of our users perform a transaction in production, said data change will be reflected in our QA environment instantly.

Expected Advantages:

  • Excellent Soak Testing – We’ll be able to deploy a pre-production build of our product to our Prod-replicated-QA-environment and see how it handles actual production data updates.  This is huge because we have been unable to find some bugs until our product builds experience real live usage.
  • Use real live user scenarios to drive tests – We have a suite of automated checks that invoke fake updates in our transactional data bases, then expect data warehouse updates within certain time spans.  The checks use fake updates.  Until now.  With the Prod-replicated-QA-environment, we are attempting to programmatically detect real live data updates via logging, and measure those against expected results.
  • Comparing reports – A new flavor of automated checks is now possible.  With the Prod-replicated-QA-environment, we are attempting to use production report results as a golden master to compare to QA report results sitting on the pre-production QA build data warehouse.  Since the data warehouse data to support the reports should be the same, we can expect the report results to match.

Expected Challenges:

  • The Prod-replicated-QA-environment will be read-only.  This means instead of creating fake user actions whenever we want, we will need to wait until they occur.  What if some don’t occur…within the soak test window?
  • No more data comparing? - Comparing transactional data to data warehouse data has always been a bread and butter automated check we’ve performed.  These checks check data integrity and data loading.  Comparing a real live quickly changing source to a slowly updating target will be difficult at best.


  1. Tadas said...

    Thanks for sharing the concept, that's quite interesting. Would be nice to have a hands on experience on such an environment. Also wonder about the results- waiting for another post, how it actually went and what kind of underwater rocks were hit :) good luck and have fun!

  2. Anonymous said...

    This certainly does open up exciting testing possibilities!

    Where I work, we manually transfer database backups from the production environment to our QA environment. We then restore those backups as live read-write databases. You might say that our databases are "branched" instead of "replicated" from production data.

    You pointed out several benefits of the "replicated" approach which our "branched" approach doesn't provide, like how live updates constitute a very realistic soak test. On the flip side, the benefit of our "branched" approach is that we are able to create fake user actions on top of what's already there.

    One thing to consider is the sensitivity of the data; for example, credit card numbers or health information. The database backups we obtain from production contain sensitive data, and we need to be sure that we don't accidentally include any of it in our incident reports. We don't currently have any automated test scripts, but if we did, we would probably need to scrub any results logged from those as well.

  3. Joe said...

    Congratulations! Having a copy of Production is a very useful tool.

    That said, I'm hoping that this isn't the only data you are using now?

    Using "what is at this instant" doesn't mean you are testing "what could be at some point in the future".

Copyright 2006| Blogger Templates by GeckoandFly modified and converted to Blogger Beta by Blogcrowds.
No part of the content or the blog may be reproduced without prior written permission.