Which of the above questions is more important for testers to ask?
Let’s say you are an is-there-a-problem-here tester:
- This calculator app works flawlessly as far as I can tell. We’ve tested everything we can think of that might not work and everything we can think of that might work. There appear to be no bugs. Is there a problem here? No.
- This mileage tracker app crashes under a load of 1000 users. Is there a problem here? Yes.
But might the is-there-a-problem-here question get us into trouble sometimes?
- This calculator app works flawlessly…but we actually needed a contact list app.
- This mileage tracker app crashes under a load of 1000 users but only 1 user will use it.
Or perhaps the is-there-a-problem-here question only fails us when we use too narrow an interpretation:
- Not meeting our needs, is a problem. Is there a problem here? Yes. We developed the wrong product, a big problem.
- A product that crashes under a load of 1000 users may actually not be a problem if we only need to support 1 user. Is there a problem here? No.
Both are excellent questions. For me, the will-it-meet-our-needs question is easier to apply and I have a slight bias towards it. I’ll use them both for balance.
Note: The “Will it meet our needs?” question came to me from a nice Pete Walen article. The “Is there a problem here?” came to me via Michael Bolton.
I often hear people describe their automated test approach by naming the tool, framework, harness, technology, test runner, or structure/format. I’ve described mine the same way. It’s safe. It’s simple. It’s established. “We use Cucumber”.
Lately, I’ve seen things differently.
Instead of trying to pigeon hole each automated check into a tightly controlled format for an entire project, why not design automated checks for each Story, based on their best fit for that story?
I think this notion comes from my context-driven test schooling. Here’s an example:
On my current project, we said “let’s write BDD-style automated checks”. We found it awkward to pigeon-hole many of our checks into Given, When, Then. After eventually dropping the mandate for BDD-style, I discovered the not-as-natural-language style to be easier to read, more flexible, and quicker to author…for some Stories. Some Stories are good candidates for data-driven checks authored via Excel. Some might require manual testing with a mocked product...computer-assisted-exploratory-testing…another use of automation. Other Stories might test better using non-deterministic automated diffs.
Sandboxing all your automated checks into FitNesse might make test execution easier. But it might stifle test innovation.
…may not be a good way to start testing.
I heard a programmer use this metaphor to describe the testing habits of a tester he had worked with.
As a tester, taking all test input variables to their extreme, may be an effective way to find bugs. However, it may not be an effective way to report bugs. Skilled testers will repeat the same test until they isolate the minimum variable(s) that cause the bug. Or using this metaphor, they may repeat the same test with all levels on the mixing board pulled down, except the one they are interested in observing.
Once identified, the skilled tester will repeat the test only changing the isolated variable, and accurately predict a pass or fail result.
Dear Test Automators,
The next time you discuss automation results, please consider qualifying the context of the word “bug”.
If automation fails, it means one of two things:
- There is a bug in the product-under-test.
- There is a bug in the automation.
The former is waaaaaay more important than the latter. Maybe not to you, but certainly for your audience.
Instead of saying,
“This automated check failed”,
“This automated check failed because of a bug in the product-under-test”.
Instead of saying,
“I’m working on a bug”,
“I’m working on a bug in the automation”.
Your world is arguably more complex than that of testers who don’t use automation. You must test twice as many programs (the automation and the product-under-test). Please consider being precise when you communicate.
So, you’ve got a green thumb. You’ve been growing houseplants your whole life. Now try to grow an orchid. What you’ve learned about houseplants has taught you very little about orchids.
- Put one in soil and you’ll kill it (orchids grow on rocks or bark).
- Orchids need about 20 degrees Fahrenheit difference between day and night.
- Orchids need wind and humidity to strive.
- Orchids need indirect sunlight. Lots of it. But put them in the sun and they’ll burn.
- Fading flowers does not mean your orchid is dying (orchids bloom in cycles).
So, you’re a skilled tester. You’ve been testing functional applications with user interfaces your whole career. Now try to test a data warehouse. What you’ve learned about functionality testing has taught you very little about data testing.
- “Acting like a user”, will not get you far. Efficient data testing does not involve a UI and depends little on other interfaces. There are no buttons to click or text boxes to interrogate during a massive data quality investigation.
- Lack of technical skills will kill you. Interacting with a DB requires DB Language skills (e.g., TSQL). Testing millions of lines of data requires coding skills to enlist the help of machine-aided-exploratory-testing.
- Checking the health of your data warehouse prior to deployments probably requires automated checks.
- For functional testing, executing shallow tests first to cover breadth, then deep tests later is normally a good approach. In data testing, the opposite may be true.
- If you are skilled at writing bug reports with detailed repro steps, this skill may hinder your effectiveness at communicating data warehouse bugs, where repro steps may not be important.
- If you are used to getting by as a tester, not reading books about the architecture or technology of your system-under-test, you may fail at data warehouse testing. In order to design valuable tests, a tester will need to study data warehouses until they grok concepts like Inferred Members, Junk Dimensions, Partitioning, Null handling, 3NF, grain, and Rapidly Changing Monster Dimensions.
Testers, let’s respect the differences in the projects we test, and grow our skills accordingly. Please don’t use a one-size-fits-all approach.
I think it’s only people who experience bugs.
Sadly, devs, BAs, other testers, stakeholders, QA managers, directors, etc. seldom appear interested in the fruits of our labor. The big exception is when any of these people experience a bug, downstream of our test efforts.
“Hey, did you test this? Did it pass? It’s not working when I try it.”
Despite the disinterest, us testers spend a lot of effort standing up ways to report test results. Whether it be elaborate pass/fail charts or low-tech information-radiators on public whiteboards, we do our best. I’ve put lots of energy into coaching my testers to give better test reports but I often second guess this…wondering how beneficial the skill is.
Why isn’t anyone listening? These are some reasons I can think of:
- Testers have done such a poor job of communicating test results, in the past, that people don’t find the results valuable.
- Testers have done such a poor job of testing, that people don’t find the results valuable.
- People are mainly interested in completing their own work. They assume all is well with their product until a bug report shows up.
- Testing is really difficult to summarize. Testers haven't found an effective way of doing this.
- Testing is really difficult to summarize. Potentially interested parties don’t want to take the time to understand the results.
- People think testers are quality cops instead of quality investigators; People will wait for the cops to knock on their door to deliver bad news.
- Everyone else did their own testing and already know the results.
- Test results aren’t important. They have no apparent bearing on success or failure of a product.
We had a relatively disastrous prod deployment last week. Four bugs, caused by a large refactor, were missed in test. But here’s the weirder part, along with those four bugs, the users started reporting previously existing functionality as new bugs, and in some cases, convincing us to do emergency patches to change said previously existing functionality.
It seems bugs beget bugs.
Apparently the shock of these initial four bugs created a priming effect, which resulted in overly-critical user perceptions:
“I’ve never noticed that before…must be something else those clowns broke.”
I’ve heard people are more likely to tidy up if they smell a faint scent of cleaning liquid. Same thing occurs with bugs I guess.
What’s the lesson here? Releasing four bugs might be more expensive than fixing four bugs. It might mean fixing seven and dealing with extra support calls until the priming effect wears off.