If you read part 1, you may be wondering how my automated check performed…

The programmer deployed the seeded bug and I’m happy to report, my automated check found it in 28 seconds! 

Afterwards, he seeded two additional bugs.  The automated check found those as well.  I had to temporarily modify the automated check code to ignore the first bug in order to find the second.  This is because the check stops checking as soon as it finds one problem.  I could tweak the code to collect problems and keep checking but I prefer the current design.

Here is the high level generic design of said check:

Build the golden masters:

  • Make scalable checks - Before test execution, build multiple golden masters per coverage ambition.  This is a one-time-only task (until the golden masters need to be updated per expected changes).
  • Bypass GUI when possible – Each of my golden masters consist of the response XML from a web service call, saved to a file.  Each XML response has over a half a million nodes, which are mapped to a complex GUI.  In my case, my automated check will bypass the GUI.  GUI automation could never have found the above seeded bug in 28 seconds.  My product-under-test takes about 1.5 minutes just to log in and navigate to the module being tested. Waiting for the GUI to refresh after the countless service calls being made in the automated check would have taken hours.
  • Golden masters must be golden! Use a known good source for the service call.  I used Production because my downstream environments are populated with data restored from production.  You could use a test environment as long as it was in a known good state.
  • Use static data - Build the golden masters using service request parameters that return a static response.  In other words, when I call said service in the future, I want the same data returned.  I used service request parameters to pull historical data because I expect it to be the same data next week, month, year, etc.
  • Automate golden master building - I wrote a utility method to build my golden masters.  This is basically re-used code from the test method, which builds the new objects to compare to the golden masters.

Do some testing:

  • Compare - This is the test method.  It calls the code-under-test using the same service request parameters used to build the golden masters.  The XML service response from the code-under-test is then compared to that of the archived golden masters, line-by-line.
  • Ignore expected changes - In my case there are some XML nodes the check ignores.  These are nodes with values I expect to differ.  For example, the CreatedDate node of the service response object will always be different from that of the golden master.
  • Report - If any non-ignored XML line is different, it’s probably a bug, fail the automated check, report the differences with line number and file (see below) references and investigate.
  • Write Files - For my goals, I have 11 different golden masters (to compare with 11 distinct service response objects).  The automated check loops through all 11 golden master scenarios, writing each service response XML to a file.  The automated check doesn’t use the files, they are there for me.  This gives me the option to manually compare suspect new files to golden masters with a diff tool, an effective way of investigating bugs and determining patterns.

I’m feeling quite cocky at the moment.  So cocky that I just asked my lead programmers to secretly insert a bug into the most complex area of the system under test.

Having just finished another epic automated check based on the Golden Master approach I discussed earlier, and seeing that most of the team is on pre-Thanksgiving vacation, this is the perfect time to “seed a bug”.  Theoretically, this new automated check should help put our largest source of regression bugs to rest and I am going to test it.

The programmer says he will hide the needle in the haystack by tomorrow.

I’m waiting…

You’ve got this new thing to test. 

You just read about a tester who used Selenium and he looked pretty cool in his bio picture.  Come on, you could do that.  You could write an automated check for this.  As you start coding, you realize your initial vision was too ambitious so you revise it.  Even with the revised design you’re running into problems with the test stack.  You may not be able to automate the initial checks you wanted, but you can automate this other thing.  That’s something.  Besides, this is fun.  The end is in sight.  It will be so satisfying to solve this.  You need some tasks with closure in your job, right?  This automated check has a clear output.  You’ve almost cracked this thing…cut another corner and it just might work.  Success!  The test passes!  You see green!  You rule!  You’re the Henry Ford of testing!  You should wear a cape to work!

Now that your automated thingamajig is working and bug free, you can finally get back to what you were going to test.  Now what was it?

I’m not hating on test automation.  I’m just reminding myself of its intoxicating trap.  Keep your eyes open.

You must really be a sucky tester.

I’m kidding, of course.  There may be several explanations as to why an excellent tester like yourself is not finding bugs.  Here are four:

  • There aren’t any bugs!  Duh.  If your programmers are good coders and testers, and perhaps writing a very simple Feature in a closely controlled environment, it’s possible there are no bugs.
  • There are bugs but the testing mission is not to find them.  If the mission is to do performance testing, survey the product, determine under which conditions the product might work, smoke test, etc., it is likely we will not find bugs.
  • A rose by any other name… Maybe you are finding bugs in parallel with the coding and they are fixed before ever becoming “bug reports”.  In that case, you did find bugs but are not giving yourself credit.
  • You are not as excellent as you think.  Sorry.  Finding bugs might require skills you don’t have.  Are you attempting to test data integrity without understanding the domain?  Are you testing network transmissions without reading pcaps?

As testers, we often feel expendable when we don’t find bugs.  We like to rally around battle cries like:

“If it ain’t broke, you’re not trying hard enough!”

“I don’t make software, I break it!”

“There’s always one more bug.”

But consider this, a skilled tester can do much more than merely find a bug.  A skilled tester can also tells us what appears to work, what hasn’t broken in the latest version, what unanticipated changes have occurred in our product, how it might work better, how it might solve additional problems, etc. 

And that may be just as important as finding bugs.

Hey testers, don’t say:

“yesterday I tested a story.  Today I’m going to test another story.  No impediments”

Per Scrum inventor, Jeff Sutherland, daily standups should not be “I did this…”, “I’ll do that…”.  Instead, share things that affect others with an emphasis on impediments.  The team should leave the meeting with a sense of energy and urgency to rally around the solutions of the day.  When the meeting ends, the team should be saying, “Let’s go do this!”.

Here are some helpful things a tester might say in a daily standup:

  • Let’s figure out the repro steps for production Bug40011 today, who can help me?
  • I found three bugs yesterday, please fix the product screen bug first because it is blocking further testing.
  • Sean, I know you’re waiting on my feedback on your new service, I’ll get that too you first thing today.
  • Yesterday I executed all the tests we discussed for Story102, unless someone can think of more, I am done with that testing.  Carl, please drop by to review the results.
  • I’m getting out of memory errors on some test automation, can someone stop by to help?
  • If I had a script to identify data corruption, it would save hours.
  • Paul, I understand data models, I’ll test that for you and let you know something by noon.
  • The QA data seems stale.  Don’t investigate any errors yet.  I’m going to refresh data and retest it today.  I’ll let you know when I’m done.
  • Jolie, if you can answer my question on expected behavior, I can finish testing that Story this afternoon.

Your role as a tester affects so many people. Think about what they might be interested in and where your service might be most valuable today.

“Golden Master”, it sounds like the bad guy in a James Bond movie.  I first heard the term used by Doug Hoffman at STPCon Spring 2012 during his Exploratory Test Automation workshop.  Lately, I’ve been writing automated golden master tests that check hundreds of things with very little test code.

I think Golden-Master-Based testing is super powerful, especially when paired with automation.

A golden master is simply a known good version of something from your product-under-test.  It might be a:

  • web page
  • reference table
  • grid populated with values
  • report
  • or some other file output by your product

Production is an excellent place to find golden masters because if users are using it, it’s probably correct.  But golden masters can also be fabricated by a tester.

Let’s say your product outputs an invoice file.  Here’s a powerful regression test in three steps:

  1. Capture a known good invoice file from production (or a QA environment).  This file is your golden master. 
  2. Using the same parameters that were used to create the golden master, re-create the invoice file on the new code under test. 
  3. Programmatically compare the new invoice to the golden master using your favorite diff tool or code.

Tips and Ideas:

  • Make sure the risky business logic code you want to test is being exercised.
  • If you expand on this test, and fully automate it, account for differences you don’t care about (e.g., the invoice generated date in the footer, new features you are expecting to not yet be in production). 
  • Make it a data-driven test. Pass in a list of orders and customers, retrieve production golden masters and compare them to dynamically generated versions based on the new code.
  • Use interesting dates and customers.  Iterate through thousands of scenarios using that same automation code.
  • Use examples from the past that may not be subject to changes after capturing the golden master.
  • Structure your tests assertions to help interpret failures.  The first assertion on the invoice file might be, does the item line count match?  The second might be, do each line’s values match?
  • Get creative.  Golden masters can be nearly anything.

Who else uses this approach?  I would love to hear your examples.

Invigorated by the comments in my last post, I’ll revisit the topic.

I don’t think we can increase our tester reputations by sticking to the credo:

“Raise every bug, no matter how trivial”

Notice, I’m using the language “raise” instead of “log”.  This is an effort to include teams that have matured to the point of replacing bug reports with conversations.  I used the term “share” in my previous post but I like “raise” better.  I think Michael Bolton uses it.

Here are a couple problems with said credo:

  1. Identifying bugs is so complex that one cannot commit to raising them all.  As we test, there are countless evaluations our brains are making;  “That screen seems slow today, that control might be better a hair to the right, why isn’t there a flag in the DB to persist that data?”.  We are constantly making decisions of which observations are worth spending time on.  The counter argument to my previous post seems to be, just raise everything and let the stakeholders decide.  I argue, everything is too much.  Instead, the more experience and skill a tester gains, the better she will know what to raise.  And yes, she should be raising a lot, documenting bugs/issues as quickly as she can.  I still think, with skill, she can skip the trivial ones.
  2. Raising trivial bugs hurts your reputation as a tester.  I facilitate bug triage meetings with product owners. Trivial bugs are often mocked before being rejected":  “Ha! Does this need to be fixed because it’s bugging the tester or the user?  Reject it!  Why would anyone log that?”.  Important bugs have the opposite reaction.  Sorry.  That’s the way it is.
  3. Time is finite.  If I’m testing something where bugs are rare, I’ll be more inclined to raise trivial bugs.  If I’m testing something where bugs are common, I’ll be more inclined to spend my time on (what I think) are the most important bugs.

It’s not the tester’s job to decide what is important.  Yes, in general I agree.  But I’m not dogmatic about this.  Maybe if I share some examples of trivial bugs (IMO), it will help:

  • Your product has an administrative screen that only can be used by a handful of tech support people.  They use it once a year.  As a tester, you notice the admin screen does not scroll with your scroll wheel.  Instead, one must use the scroll bar.  Trivial bug.
  • Your product includes a screen with two radio buttons.  You notice that if you toggle between the radio buttons 10 times and then try to close the screen less than a second later, a system error gets logged behind the scenes. Trivial bug.
  • Your product includes 100 different reports users can generate.  These have been in production for 5 years without user complaints.  You notice some of these reports include a horizontal line above the footer while others do not.  Trivial bug.
  • The stakeholders have given your development team 1 million dollars to build a new module.  They have expressed their expectations that all energy be spent on the new module and they do not want you working on any bugs in the legacy module unless they report the bug themselves and specifically request its fix.  You find a bug in the legacy module and can’t help but raise it…

You laugh, but the drive to raise bugs is stronger than you may think.  I would like to think there is more to our jobs than “Raise every bug, no matter how trivial”.

(Edit on 10/1/2014) Although too long, a better title would have been “You May Not Want To Tell Anyone About That Trivial Bug”.  Thanks, dear readers, for your comments.

It’s a bug, no doubt.  Yes, you are a super tester for finding it.  Pat yourself on the back.

Now come down off that pedestal and think about this.  By any stretch of the imagination, could that bug ever threaten the value of the product-under-test?  Could it threaten the value of your testing?  No?  Then swallow your pride and keep it to yourself. 

My thinking used to be: “I’ll just log it as low priority so we at least know it exists”.  As a manager, when testers came to me with trivial bugs, I used to give the easy answer, “Sure, they probably won’t fix it but log it anyway”.

Now I see things differently.  If a trivial bug gets logged, often…

  • a programmer sees the bug report and fixes it
  • a programmer sees the bug report and wonders why the tester is not testing more important things
  • a team member stumbles upon the bug report and has to spend 4 minutes reading it and understanding it before assigning some other attribute to it (like “deferred” or “rejected”)
  • a team member argues that it’s not worth fixing
  • a tester has spent 15 minutes documenting a trivial bug.

It seems to me, reporting trivial bugs tends to waste everybody’s time.  Time that may be better spent adding value to your product.  If you don’t buy that argument, how about this one:  Tester credibility is built on finding good bugs, not trivial ones.

About five years ago, my tester friend, Alex Kell, blew my mind by cockily declaring, “Why would you ever log a bug?  Just send the Story back.”


My dev team uses a Kanban board that includes “In Testing” and “In Development” columns.  Sometimes bug reports are created against Stories.  But other times Stories are just sent left; For example, a Story “In Testing” may have its status changed to “In Development”, like Alex Kell’s maneuver above.  This normally is done using the Dead Horse When-To-Stop-A-Test Heuristic. We could also send an “In Development” story left if we decide the business rules need to be firmed up before coding can continue.

So how does one know when to log a bug report vs. send it left?

I proposed the following heuristic to my team today:

If the Acceptance Test Criteria (listed on the Story card) is violated, send it left.  It seems to me, logging a bug report for something already stated in the Story (e.g., Feature, Work Item, Spec) is mostly a waste of time.


While reading Duncan Nisbet’s TDD For Testers article, I stumbled on a neat term he used, “follow-on journey”.

For me, the follow-on journey is a test idea trigger for something I otherwise would have just called regression testing.  I guess “Follow-on journey” would fall under the umbrella of regression testing but it’s more specific and helps me quickly consider the next best tests I might execute.

Here is a generic example:

Your e-commerce product-under-test has a new interface that allows users to enter sales items into inventory by scanning their barcode.  Detailed specs provide us with lots of business logic that must take place to populate each sales item upon scanning its barcode.  After testing the new sales item input process, we should consider testing the follow-on journey; what happens if we order sales items ingested via the new barcode scanner?

I used said term to communicate test planning with another tester earlier today.  The mental image of an affected object’s potential journeys helped us leap to some cool tests.

This efficiency didn’t occur to me until recently.  I was doing an exploratory test session and documenting my tests via Rapid Reporter.  My normal process had always been to document the test I was about to execute…

TEST: Edit element with unlinked parent

…execute the test.  Then write “PASS” or “FAIL” after it like this…

TEST: Edit element with unlinked parent – PASS

But it occurred to me that if a test appears to fail, I tag said failure as a “Bug”, “Issue”, “Question”, or “Next Time”.  As long as I do that consistently, there is no need to add “PASS” or “FAIL” to the documented tests.  While debriefing about my tests post session, the assumption will be that the test passed unless indicated otherwise.

Even though it felt like going to work without pants, after a few more sessions, it turns out, not resolving to “PASS” or “FAIL” reduces administrative time and causes no ambiguity during test reviews.  Cool!

Wait. It gets better.

On further analysis, resolving all my tests to “PASS” or “FAIL” may have prevented me from actual testing.  It was influencing me to frame everything as a check.  Real testing does not have to result in “PASS” or “FAIL”.  If I didn’t know what was supposed to happen after editing an element with an unlinked parent (as in the above example), well then it didn’t really “PASS” or “FAIL”, right?  However, I may have learned something important nevertheless, which made the test worth doing…I’m rambling.

The bottom line is, maybe you don’t need to indicate “PASS” or “FAIL”.  Try it.

Which of the above questions is more important for testers to ask?

Let’s say you are an is-there-a-problem-here tester: 

  • This calculator app works flawlessly as far as I can tell.  We’ve tested everything we can think of that might not work and everything we can think of that might work.  There appear to be no bugs.  Is there a problem here?  No.
  • This mileage tracker app crashes under a load of 1000 users.  Is there a problem here?  Yes.

But might the is-there-a-problem-here question get us into trouble sometimes?

  • This calculator app works flawlessly…but we actually needed a contact list app.
  • This mileage tracker app crashes under a load of 1000 users but only 1 user will use it.

Or perhaps the is-there-a-problem-here question only fails us when we use too narrow an interpretation:

  • Not meeting our needs, is a problem.  Is there a problem here?  Yes.  We developed the wrong product, a big problem.
  • A product that crashes under a load of 1000 users may actually not be a problem if we only need to support 1 user.  Is there a problem here?  No.

Both are excellent questions.  For me, the will-it-meet-our-needs question is easier to apply and I have a slight bias towards it.  I’ll use them both for balance.

Note: The “Will it meet our needs?” question came to me from a nice Pete Walen article.  The “Is there a problem here?” came to me via Michael Bolton.

I often hear people describe their automated test approach by naming the tool, framework, harness, technology, test runner, or structure/format.  I’ve described mine the same way.  It’s safe.  It’s simple.  It’s established.  “We use Cucumber”.

Lately, I’ve seen things differently.

Instead of trying to pigeon hole each automated check into a tightly controlled format for an entire project, why not design automated checks for each Story, based on their best fit for that story?

I think this notion comes from my context-driven test schooling.  Here’s an example:

On my current project, we said “let’s write BDD-style automated checks”.  We found it awkward to pigeon-hole many of our checks into Given, When, Then.  After eventually dropping the mandate for BDD-style, I discovered the not-as-natural-language style to be easier to read, more flexible, and quicker to author…for some Stories.  Some Stories are good candidates for data-driven checks authored via Excel.  Some might require manual testing with a mocked product...computer-assisted-exploratory-testing…another use of automation.  Other Stories might test better using non-deterministic automated diffs.

Sandboxing all your automated checks into FitNesse might make test execution easier.  But it might stifle test innovation.

…may not be a good way to start testing.

I heard a programmer use this metaphor to describe the testing habits of a tester he had worked with. 

As a tester, taking all test input variables to their extreme, may be an effective way to find bugs.  However, it may not be an effective way to report bugs.  Skilled testers will repeat the same test until they isolate the minimum variable(s) that cause the bug.  Or using this metaphor, they may repeat the same test with all levels on the mixing board pulled down, except the one they are interested in observing.

Once identified, the skilled tester will repeat the test only changing the isolated variable, and accurately predict a pass or fail result.

Dear Test Automators,

The next time you discuss automation results, please consider qualifying the context of the word “bug”.

If automation fails, it means one of two things:

  1. There is a bug in the product-under-test.
  2. There is a bug in the automation.

The former is waaaaaay more important than the latter.  Maybe not to you, but certainly for your audience.

Instead of saying,

This automated check failed”,

consider saying,

This automated check failed because of a bug in the product-under-test”.


Instead of saying,

I’m working on a bug”,

consider saying,

I’m working on a bug in the automation”.


Your world is arguably more complex than that of testers who don’t use automation.  You must test twice as many programs (the automation and the product-under-test).  Please consider being precise when you communicate.

So, you’ve got a green thumb.  You’ve been growing houseplants your whole life.  Now try to grow an orchid.  What you’ve learned about houseplants has taught you very little about orchids.

  • Put one in soil and you’ll kill it (orchids grow on rocks or bark). 
  • Orchids need about 20 degrees Fahrenheit difference between day and night.
  • Orchids need wind and humidity to strive.
  • Orchids need indirect sunlight.  Lots of it.  But put them in the sun and they’ll burn.
  • Fading flowers does not mean your orchid is dying (orchids bloom in cycles).

So, you’re a skilled tester.  You’ve been testing functional applications with user interfaces your whole career.  Now try to test a data warehouse.  What you’ve learned about functionality testing has taught you very little about data testing.

  • “Acting like a user”, will not get you far.  Efficient data testing does not involve a UI and depends little on other interfaces.  There are no buttons to click or text boxes to interrogate during a massive data quality investigation.
  • Lack of technical skills will kill you.  Interacting with a DB requires DB Language skills (e.g., TSQL).  Testing millions of lines of data requires coding skills to enlist the help of machine-aided-exploratory-testing.
  • Checking the health of your data warehouse prior to deployments probably requires automated checks.
  • For functional testing, executing shallow tests first to cover breadth, then deep tests later is normally a good approach.  In data testing, the opposite may be true.
  • If you are skilled at writing bug reports with detailed repro steps, this skill may hinder your effectiveness at communicating data warehouse bugs, where repro steps may not be important.
  • If you are used to getting by as a tester, not reading books about the architecture or technology of your system-under-test, you may fail at data warehouse testing.  In order to design valuable tests, a tester will need to study data warehouses until they grok concepts like Inferred Members, Junk Dimensions, Partitioning, Null handling, 3NF, grain, and Rapidly Changing Monster Dimensions.

Testers, let’s respect the differences in the projects we test, and grow our skills accordingly.  Please don’t use a one-size-fits-all approach.

I think it’s only people who experience bugs.

Sadly, devs, BAs, other testers, stakeholders, QA managers, directors, etc. seldom appear interested in the fruits of our labor.  The big exception is when any of these people experience a bug, downstream of our test efforts.

“Hey, did you test this?  Did it pass?  It’s not working when I try it.”

Despite the disinterest, us testers spend a lot of effort standing up ways to report test results.  Whether it be elaborate pass/fail charts or low-tech information-radiators on public whiteboards, we do our best.  I’ve put lots of energy into coaching my testers to give better test reports but I often second guess this…wondering how beneficial the skill is.

Why isn’t anyone listening?  These are some reasons I can think of:

  • Testers have done such a poor job of communicating test results, in the past, that people don’t find the results valuable.
  • Testers have done such a poor job of testing, that people don’t find the results valuable.
  • People are mainly interested in completing their own work.  They assume all is well with their product until a bug report shows up.
  • Testing is really difficult to summarize.  Testers haven't found an effective way of doing this.
  • Testing is really difficult to summarize.  Potentially interested parties don’t want to take the time to understand the results.
  • People think testers are quality cops instead of quality investigators; People will wait for the cops to knock on their door to deliver bad news.
  • Everyone else did their own testing and already know the results.
  • Test results aren’t important.  They have no apparent bearing on success or failure of a product.

We had a relatively disastrous prod deployment last week.  Four bugs, caused by a large refactor, were missed in test.  But here’s the weirder part, along with those four bugs, the users started reporting previously existing functionality as new bugs, and in some cases, convincing us to do emergency patches to change said previously existing functionality.

It seems bugs beget bugs.

Apparently the shock of these initial four bugs created a priming effect, which resulted in overly-critical user perceptions:

“I’ve never noticed that before…must be something else those clowns broke.”

I’ve heard people are more likely to tidy up if they smell a faint scent of cleaning liquid.  Same thing occurs with bugs I guess. 

What’s the lesson here?  Releasing four bugs might be more expensive than fixing four bugs.  It might mean fixing seven and dealing with extra support calls until the priming effect wears off.

What if the test documentation is written per the detail level in this post… 
Document enough detail such that you can explain the testing you did, in-person, up to a year later.
…and the tester who wrote it left the company? 
Problem #1:
How are new testers going to use the old test documentation for new testing?  They aren’t, and that’s a good thing.

Add test detail as late as possible because things change.  If you agree with that heuristic, you can probably see how a tester leaving the company would not cause problems for new testers.

  • If test documentation was written some time ago, and our system-under-test (SUT) changed, that documentation might be wrong.
  • Let’s suppose things didn’t change.  In that case, it doesn’t matter if the tester left because we don’t have to test anything.
  • How about the SUT didn’t change but an interfacing system did.  In this case, we may feel comfortable using the old test documentation (in a regression test capacity).  In other words, let’s talk about when the write-tests-as-late-as-possible heuristic is wrong and the test documentation author left the company.  If you agree that a test is something that happens in one’s brain and not a document, wouldn’t we be better off asking our testers to learn the SUT instead of copying someone else’s detailed steps?  Documentation, at this level of detail, might be a helpful training aid, but it will not allow an unskilled tester to turn off their brain.  Get it?
Problem #2:
How are people going to understand old test documentation if the tester who left, must be available to interpret?  They won’t understand it in full.  They will probably understand parts of it.  An organization with high tester turnover and lots of audit-type inquiries into past testing, may need more than this level of detail.
But consider this: Planning all activities around the risk that an employee might leave, is expensive.  A major trade-off of writing detailed test documentation is lower quality.

(and one of Michael Bolton’s)
One of my testers took James Bach’s 3-day online Rapid Testing Intensive class.  I poked my head in from time to time, couldn’t help it.  What struck me is how metaphor after metaphor dripped from Bach’s mouth like poetry.  I’ve heard him speak more times than I can count but I’ve never heard such a spontaneous panoply of beautiful metaphors.  Michael Bolton, acting as assistant, chimed in periodically with his own metaphors.  Here are some from the portions I observed:

  • A tester is like a smoke alarm, their job is to tell people when and where a fire is.  However, they are not responsible for telling people how to evacuate the building or put out the fire.
  • (Michael Bolton) Testers are like scientists.  But scientists have it easy;  They only get one build.  Testers get new builds daily so all bets are off on yesterday’s test results.
  • Buying test tools is like buying a sweater for someone.  The problem is, they feel obligated to wear the sweater, even if it’s not a good fit.
  • Testers need to make a choice; either learn to write code or learn to be charming.  If you’re charming, perhaps you can get a programmer to write code for you.   It’s like having a friend who owns a boat.
  • Deep vs. Shallow testing.  Some testers only do Shallow testing.  That is like driving a car with a rattle in the door…”I hear a rattle in the door but it seems to stay shut when I drive so…who cares?”.
  • Asking a tester how long it will take to test is like being diagnosed with cancer and asking the doctor how long you have to live.
  • Asking a tester how long the testing phase will last is like asking a flight attendant how long the flight attendant service will last.
  • Complaining to the programmers about how bad their code looks is like being a patron at a restaurant and walking back into the kitchen to complain about the food to the chefs.  How do you think they’re going to take it?
  • Too many testers and test managers want to rush to formality (e.g., test scripts, test plans).  It’s like wanting to teleport yourself home from the gym.  Take the stairs!
Thanks, James.  Keep them coming.

Well, that depends on what your clients need.  How do you know what your clients need?  I can think of two ways:

  1. Ask them.  Be careful. Per my experience, clients inflate their test documentation needs when asked.  Maybe they’re afraid they’ll insult the tester if they don’t ask for lots of test documentation.  Maybe they’re intent to review test cases is stronger than their follow-through.
  2. Observe them.  Do they ever ask for test case reviews?  If you are transparent with your test documentation, do your clients ever give you feedback?  Mine don’t (unless I initiate it).

Here is what I ask of my testers:

Document enough detail such that you can explain the testing you did, in-person, up to a year later.

Before I came up with the above, I started with this:  The tester should be present to explain their testing.  Otherwise, we risk incorrect information. 

If the tester will be present, why would we sacrifice test time to write details that can easily be explained by the tester?  In my case, the documentation serves to remind the tester.   When we review it with programmers, BA’s, users, other testers, auditors, or if I review it myself, the tester should always be present to interpret.

What if the tester leaves?

I’ll talk about that in the next post.

Last week, Alex Kell (Atlanta-based tester and my former boss) gave a fun talk at Software Testing Club Atlanta, ”The Oracle is Fallible: Recognizing, Understanding, and Evaluating the Assumptions that Testers Make”.

File:John William Waterhouse oracle 1884.png

Here are the highlights from my notes:

  • After showing John William Waterhouse’s famous 1884 painting, Consulting the Oracle (above), of 7 priestesses listening to an 8th priestess (playing the Oracle) interpret from the gods, Alex asked:
  • “Assumptions, are they bad or good?”
    • We make them because we’re lazy.
    • Sometimes we know we’re making an assumption, sometimes we don’t know.
    • After some discussion and examples of assumptions we realized we are all constantly making assumptions every waking moment and decided assumptions can be good or bad.
  • Bad assumptions (or forgetting the Oracle is fallible):
    • “The spec is correct.” – Be careful.  Remember Ron Jeffries “Three Cs”:
      • The Spec (AKA “Card”) is a reminder to have a conversation about something.
      • The Conversation is a discussion of the details that result in test confirmations.
      • The Confirmation is acceptance criteria that can be turned into acceptance tests.
    • “They know what they’re doing.” – What if everybody on the team is thinking this…group think?
    • “I know what I’m doing.”
    • “The software is working because we haven't seen a (red) failed test.”  (Dennis Stevens says “Every project starts red, and it stays red, until it is green.”)
    • “The model is reality.” – A model is an abstraction.  All decisions based on models are based on assumptions.  You should never be surprised when a model does not turn out to reflect reality.  Author, Nassim Nicholas Taleb, coined the word “platonicity” to describe the human tendency to find patterns in everything.

      Alex gave a nearly literal example of where people fall victim to this bad assumption: He told of people on Craig’s List (or similar) paying money for things like actual cars one can drive, only to discover they had just purchased a scaled down model of a car.
  • Good assumptions (I loved this and thought it was pretty bold to declare some assumptions being good for testers):
    • “The estimates are accurate”. – Take what you did last time.  Use the estimate until it is no longer helpful.
    • “The web service will honor its contract”. If testers didn’t make this assumption, might they be wasting time testing the wrong thing? 
    • There were more good assumptions but I have a gap in my notes.  Maybe Alex will leave a comment with those I missed.
  • Alex talked about J.B. Rainsberger’s “Integrated Tests Are a Scam” – In other words, if we don’t make some assumptions, we would have to code tests for the rest of our lives to make a dent in our coverage.
  • Suggestions to deal with assumptions:
    • Be explicit about your assumptions.
    • Use truth tables for complex scenarios (Alex shared one he used for his own testing).
    • System thinking – Testers should be able to explain the whole system.  This cuts down on bad assumptions.

My uncle is an audio file.  He buys his equipment only after borrowing it to test in his house.  He prefers vinyl, American-made audio equipment brands I’ve never heard of, uses dedicated amps, and only rips to FLAC.  The sound quality of his system is impeccable.

Ever since I was a kid, I wanted a sound system as good as my uncle’s.  When I was 14, I spent my paper route money on a pair of Boston Acoustic speakers, and a Marantz receiver (instead of the flashy JVC models).  The following year I bought a Magnavox CD player because it had the curved slot in the CD arm, which at the time, meant it used a quality laser reader.  Years later I added a Paradigm subwoofer after exhaustive research.

Although my home audio system doesn’t sound nearly as good as my uncle’s, it does sound better than most, at least I think so.  I take pride in maintaining it and enjoy listening to music that much more.

The more I learn about testing, the more I start to compare my testing job to that of others.  I feel pressure to modernize all test approaches and implement cool test techniques I’ve heard about.  I’m embarrassed to admit I use a Kanban board without enforcing a WIP.  Some in the industry advise:

"Try to do the right thing. If you cannot – leave!”

But I feel satisfaction making small changes.  I enjoy the challenge of debate.  I refine my ideas and find balance via contention.  A poor process provides fodder for performance goals.  Nirvana is boring.


Inspired by yet another Michael Bolton post.  I’ll try to stop doing that.

Many think testers have the power to declare something as a bug.  This normally goes without saying.  How about the inverse? 

Should testers be given the power to declare something is NOT a bug? 

Well…no, IMO.  That sounds dangerous because what if the tester is wrong?  I think many will agree with me.  Michael Bolton asked the above question in response to a commenter on this post.  It really gave me pause. 

For me, it means maybe testers should not be given the power to run around declaring things as bugs either.  They should instead raise the possibility that something may be a problem.  Then I suppose they could raise the possibility something may not be a problem.

The second thing (here is the first) Scott Barber said that stayed with me is this:

The more removed people are from IT workers, the higher their desire for metrics.  To paraphrase Scott, “the managers on the floor, in the cube farms, agile spaces or otherwise with their teams most of the time, don’t use a lot of metrics because they just feel what’s going on.”

It seems to me, those higher up people dealing with multiple projects don’t have (as much) time to visit  the cube farms and they know summarized information is the quickest way to learn something.  The problem is, too many of them think:


It hadn’t occurred to me until Scott said it.  That, alone, does not make metrics bad.  But it helps me to understand why I (as a test manager) don’t bother with them but I spend a lot of time fending off requests for them from out-of-touch people (e.g., directors, other managers).  Note: by “out-of-touch” I mean out-of-touch with the details of the workers.  Not out-of-touch in general.

Scott reminds us the right way to find the right metric for your team is to start with the question:

What is it we’re trying to learn?

I love that.  Maybe a metric is not the best way of learning.  Maybe it is.  If it is, perhaps coupling it with a story will help explain the true picture.

Thanks Scott!

I heard a great interview with performance tester, Scott Barber.  Two things Scott said stayed with me.  Here is the first.

Automated checks that record a time span (e.g., existing automated checks hijacked to become performance tests) may not need to result in Pass/Fail, as respect to performance.  Instead, they could just collect their time span result as data points.  These data points can help identify patterns:

  • Maybe the time span increases by 2 seconds after each new build.
  • Maybe the time span increases by 2 seconds after each test run on the same build.
  • Maybe the time span unexpectedly decreases after a build.
  • etc.

My System 1 thinking tells me to add a performance threshold that resolves automated checks to a mere Pass/Fail.  Had I done that, I would have missed the full story, as Facebook did. 

Rumor has it, Facebook had a significant production performance bug that resulted from reliance on a performance test that didn’t report performance increases.  It was supposed to Fail if the performance dropped.

At any rate, I can certainly see the advantage of dropping Pass/Fail in some cases and forcing yourself to analyze collected data points instead.

I often hear skeptics question the value of test automation.  Their questioning is healthy for the test industry and it might flush out bad test automation.  I hope it continues.

But shouldn’t these same questions be raised about human testing (AKA Manual testing)?  If these same skeptics judged human testing with the same level of scrutiny, might it improve human testing? 

First, the common criticisms of test automation:

  • Sure, you have a lot of automated checks in your automated regression check suite, but how many actually find bugs?
  • It would take hours to write an automated check for that.  A human could test it in a few seconds.
  • Automated checks can’t adapt to minor changes in the system under test.  Therefore, the automated checks break all the time.
  • We never get the ROI we expect with test automation.  Plus, it’s difficult to measure ROI for test automation.
  • We don’t need test automation.  Our manual testers appear to be doing just fine.

Now let’s turn them around to question manual testing:

  • Sure, you have a lot of manual tests in your manual regression test suite, but how many actually find bugs?
  • It would take hours for a human to test that.  A machine could test it in a few seconds.
  • Manual testers are good at adapting to minor changes in the system under test.  Sometimes, they aren’t even aware of their adaptions.  Therefore, manual testers often miss important problems.
  • We never get the ROI we expected with manual testing.  Plus, it’s difficult to measure ROI for manual testing.
  • We don’t need manual testers.  Our programmers appear to be doing just fine with testing.

It’s true.  Our job rocks.  Huff Post called it the 2nd happiest job in America this year.  Second only to being a DBA…yawn.  Two years ago, Forbes said testing was #1

But why?  Neither article goes in depth.  Maybe it’s because all news is good news, for a tester:

  • The System Under Test (SUT) is crashing in QA, it doesn’t work, it’s a steaming pile of…YES!  My testing was valuable!  My life has meaning!  My testing just saved users from this nightmare!
  • The SUT is finally working.  Awesome!  It’s so nice being part of a development team that can deliver quality software.  I can finally stop testing it and move on.  Our users are going to love this.

See?  Either way it’s good news.

Or maybe I just spin it that way to love my job more.  So be it.  If you think your testing job is stressful, you may want to make a few adjustments in how you work.  Read my You’re a Tester, Relax post.

During a recent exchange about the value of automated checks, someone rhetorically asked:

“Is automation about finding lots of bugs or triggering investigation?”

Well…the later, right?

  • When an automated check passes consistently for months then suddenly fails, it’s an indication the system-under-test (SUT) probably unexpectedly changed.  Investigate!  The SUT change may not be directly related to the check but who cares, you can still pat the check on the back and say, “thank you automated check, for warning me about the SUT change”.
  • When you design/code an automated check, you are learning how to interact with your SUT and investigating it.  If there are bugs uncovered during the automated check design/coding, you report them now and assume the automated checks should happily PASS for the rest of their existence.
  • If someone is organized enough to tell you the SUT is about to change, you should test the change and assess the impact on your automated checks and make necessary updates.  Doing so requires investigating said SUT changes.

In conclusion, one can argue, even the lamest of automated checks can still provide value.  Then again, one can argue most anything.

Perhaps Jimmy John’s should have hired some software testers before slapping their “Order Online” logo all over the place.

Yesterday, while entering a group order online, I had a little trouble typing my Delivery Instructions in the “memo-size” text box.


The only way to add a drink to your sandwich order was to add a second sandwich.  Um, I only want one sandwich.

I selected the earliest delivery time available:


However, after painstakingly collecting the orders of about 17 people, when I submitted my group order, Jimmy John’s showed me this user validation message:


And prior to me clicking the OK button, Jimmy John’s cancelled my entire order, including sending this email to all 17 people :


There is some additional context here that is too complex for this post.  Suffice it to say, I was irritated that my order was cancelled with no warning.

I called the Jimmy John’s phone number provided on the top of my screen and the dude said, “we have no idea how to retrieve your order, we just get a printout when it’s submitted”. 

In the end, the good people at my Jimmy John’s franchise accepted a fax with screen captures of my original group order (we tried email, but they couldn’t retrieve it) and they delivered the order flawlessly.

Is there a name for this?  If not, I’m going to call it a “fire drill test”.

  • A fire drill test would typically not be automated because it will probably only be used once.
  • A fire drill test informs product design so it may be worth executing early.
  • A fire drill test might be a good test candidate to delegated to a project team programmer.

Fire drill test examples:

  • Our product ingests files from an ftp site daily.  What if the files are not available for three days?  Can our product catch up gracefully?
  • Our product outputs a file to a shared directory.  What if someone removes write permission to the shared directory for our product?
  • Our product uses a nightly job to process data?  If the nightly job fails due to off-hour server maintenance, how will we know?  How will we recover?
  • Our product displays data from an external web service.   What happens if the web service is down?

Too often, us testers have so much functional testing to do, we overlook the non-functional testing or save it for the end.  If we give these non-functional tests a catchy name like “Fire Drill Test”, maybe it will help us remember them during test brainstorming.

I Attended John Stevenson’s great talk and workshop at Monday night’s Software Testing Club Atlanta.  I’m happy to report the meeting had about 15 in-person attendees and zero virtual attendees.  Maybe someone read my post.

John is a thoughtful and passionate tester.  He managed to hold our attention for 3 hours!  Here are the highlights from my notes:

  • The human brain can store 3TBs of information; This is only 1 millionth of the new information released on the internet every day.
  • Over stimulation leads to mental illness.
  • John showed us a picture and asked what we saw.  We saw a tree, flowers, the sun, etc.  Then John told us the picture was randomly generated. The point?  People see patterns even when they don’t exist.  Presumably to make sense out of information overload.
  • Don’t tell your testing stories with numbers.  “A statistician drowned while crossing a river with an average depth of 3 feet”; Isn’t that like, “99 percent of my tests passed”?
  • Don’t be a tester that waits until testing “is done” to communicate the results.  Communicate the test results you collected today?  I love this and plan to blog about it.
  • Testers, stop following the same routines.  Try doing something different.  You might end up discovering new information.
  • Testers, stop hiding what you do.  Get better at transparency and explaining your testing.  Put your tests on a public wiki.
  • Critical thinking takes practice.  It is a skill.
  • “The Pause”. Huh?  Really?  So?  Great critical thinking model explained in brief here.
  • A model for skepticism.  FiLCHeRS.
  • If you challenge someone’s view, be aware of respecting it.
  • Ways to deal with information overload:
    • Slow down.
    • Don’t over commit.
    • Don’t fear mistakes.  But do learn from them.  This is how children learn.  Play.
    • (Testing specific)  Make your testing commitments short so you can throw them away without losing much.  Don’t write some elaborate test that takes a week to write because it just might turn out to be the wrong test.
    • You spend a 3rd of your life at work.  Figure out how to enjoy work.
  • John led us through a series of group activities including the following:
    • Playing Disruptus to practice creative thinking.  (i.e., playing Scamper.)
    • Playing Story War to practice bug advocacy.
    • Determining if the 5 test phases (Documentation, Planning, Execution, Analysis, Reporting) each use Creative Thinking or Critical thinking.
  • Books John referenced that I would like to read:
    • The Signal and the Noise – Nate Silver
    • Thinking Fast and Slow – Daniel Kahneman
    • You are Not So Smart – David McRaney

At this week’s metric themed Atlanta Scrum User’s Group meetup, I asked the audience if they knew of any metrics (that could not be gamed) that could trigger rewards for development teams.  The reaction was as if I had just praised Planned Parenthood at a Pro-life rally…everyone talking over each other to convince me I was wrong to even ask.

The facilitator later rewarded me with a door prize for the most controversial question.  What?

Maybe my development team and I are on a different planet than the Agile-istas I encountered last night.  Because we are currently doing what I proposed, and it doesn’t appear to be causing any harm.

Currently, if 135 story points are delivered in the prior month AND no showstopper production bugs were discovered, everyone on the team gets a free half-day-off to use as they see fit.  We’ve achieved it twice in the past year.  The most enthusiastic part of each retrospective is to observe the prior months metrics and determine if we reached our “stretch goal”.  It’s…fun.  Let me repeat that.  It’s actually fun to reward yourself for extraordinary work.

Last night’s question was part of a quest I’ve been on to find a better reward trigger.  Throughput and Quality is what we were aiming for.  And I think we’ve gotten close.  I would like to find a better metric than Velocity, however, because story point estimation is fuzzy.  If I could easily measure “customer delight”, I would.

At the meeting, I learned about the Class of Service metric.  And I’m mulling over the idea of suggesting a “Dev Forward” % stretch goal for a given time period.

But what is this nerve I keep touching about rewards for good work?

On weekends, when I perform an extraordinary task around the house like getting up on the roof to repair a leak, fixing an electrical issue, constructing built-in furniture to solve a space problem, finishing a particularly large batch of “Thank You” cards, or whatever…I like to reward myself with a beer, buying a new power tool, relaxing in front of the TV, taking a long hot shower, etc.

Rewards rock.  What’s wrong with treating ourselves at work too?

Warning: This has very little to do with testing.

Additional Warning: I’m about to gripe.

I attended the 3rd Software Testing Club Atlanta meetup Wednesday.  Some of the meeting was spent fiddling with a virtual task board, attempting to accommodate the local people who dialed in to the meeting.

IT is currently crazy about low tech dashboards (e.g., sticky notes on a wall).  But we keep trying to virtualize them.  IMO, virtualizing stickies on a wall is silly.  The purpose is to huddle around, in-person, and ditch the complicated software that so often wastes more time than it saves.

IMO, the whole purpose of a local testing club that meets over beer and pizza is to meet over beer and pizza...in person, and engage in the kind of efficient  discussion that is best done in person.  Anything else defeats the purpose of a “local” testing club.  If I wanted to dial in and talk about testing over the phone, it wouldn’t have to be with local people. 

I’m sad to see in-person meetings increasingly get replaced by this.  But IMO, joining virtual people to real-life meetings, can be even worse.  Either make everyone virtual or make everyone meet physically.

Yes, I’m a virtual meeting curmudgeon.  I accept that virtual connections have their advantages and I allow my team to work from home as frequently as three days a week on a regular basis.  But I still firmly believe, you can’t beat good old fashioned, real-life, in-person discussions.

Yesterday, a tester asked me how to get promoted.  I said, “start learning about your craft”.  They said, "but all the testers I know don't learn anything from testing conferences or books".


And this is what makes testing such a cool career choice for some of us!  It's full of apathetic under-achievers.  So if you want to be extraordinary, it's relatively easy.  You have little competition!  Come back from a conference and attempt to implement a mere three ideas and you've probably advanced testing at your organization more than any time in the past.

Why is this?  Maybe because we fell into this career by accident.  Maybe because it's a newish career with few leading experts.  Maybe it’s because we can still make decent money on a software development team by merely trying to act like a user.  I don’t know.  What I do know is, the more I study testing, the more I love my job, and the more promotions I get. 

This crumby little humble testing blog made it in a list of the worlds top 50 testing blogs for several years.  It’s not because I’m awesome.  It’s because there weren’t that many testing blogs!

Put a little effort into learning more about testing.  Maybe something good will happen.

Copyright 2006| Blogger Templates by GeckoandFly modified and converted to Blogger Beta by Blogcrowds.
No part of the content or the blog may be reproduced without prior written permission.