The second thing (here is the first) Scott Barber said that stayed with me is this:
The more removed people are from IT workers, the higher their desire for metrics. To paraphrase Scott, “the managers on the floor, in the cube farms, agile spaces or otherwise with their teams most of the time, don’t use a lot of metrics because they just feel what’s going on.”
It seems to me, those higher up people dealing with multiple projects don’t have (as much) time to visit the cube farms and they know summarized information is the quickest way to learn something. The problem is, too many of them think:
SUMMARIZED INFORMATION = ROLLED UP NUMBERS
It hadn’t occurred to me until Scott said it. That, alone, does not make metrics bad. But it helps me to understand why I (as a test manager) don’t bother with them but I spend a lot of time fending off requests for them from out-of-touch people (e.g., directors, other managers). Note: by “out-of-touch” I mean out-of-touch with the details of the workers. Not out-of-touch in general.
Scott reminds us the right way to find the right metric for your team is to start with the question:
What is it we’re trying to learn?
I love that. Maybe a metric is not the best way of learning. Maybe it is. If it is, perhaps coupling it with a story will help explain the true picture.
I heard a great interview with performance tester, Scott Barber. Two things Scott said stayed with me. Here is the first.
Automated checks that record a time span (e.g., existing automated checks hijacked to become performance tests) may not need to result in Pass/Fail, as respect to performance. Instead, they could just collect their time span result as data points. These data points can help identify patterns:
- Maybe the time span increases by 2 seconds after each new build.
- Maybe the time span increases by 2 seconds after each test run on the same build.
- Maybe the time span unexpectedly decreases after a build.
My System 1 thinking tells me to add a performance threshold that resolves automated checks to a mere Pass/Fail. Had I done that, I would have missed the full story, as Facebook did.
Rumor has it, Facebook had a significant production performance bug that resulted from reliance on a performance test that didn’t report performance increases. It was supposed to Fail if the performance dropped.
At any rate, I can certainly see the advantage of dropping Pass/Fail in some cases and forcing yourself to analyze collected data points instead.
I often hear skeptics question the value of test automation. Their questioning is healthy for the test industry and it might flush out bad test automation. I hope it continues.
But shouldn’t these same questions be raised about human testing (AKA Manual testing)? If these same skeptics judged human testing with the same level of scrutiny, might it improve human testing?
First, the common criticisms of test automation:
- Sure, you have a lot of automated checks in your automated regression check suite, but how many actually find bugs?
- It would take hours to write an automated check for that. A human could test it in a few seconds.
- Automated checks can’t adapt to minor changes in the system under test. Therefore, the automated checks break all the time.
- We never get the ROI we expect with test automation. Plus, it’s difficult to measure ROI for test automation.
- We don’t need test automation. Our manual testers appear to be doing just fine.
Now let’s turn them around to question manual testing:
- Sure, you have a lot of manual tests in your manual regression test suite, but how many actually find bugs?
- It would take hours for a human to test that. A machine could test it in a few seconds.
- Manual testers are good at adapting to minor changes in the system under test. Sometimes, they aren’t even aware of their adaptions. Therefore, manual testers often miss important problems.
- We never get the ROI we expected with manual testing. Plus, it’s difficult to measure ROI for manual testing.
- We don’t need manual testers. Our programmers appear to be doing just fine with testing.
But why? Neither article goes in depth. Maybe it’s because all news is good news, for a tester:
- The System Under Test (SUT) is crashing in QA, it doesn’t work, it’s a steaming pile of…YES! My testing was valuable! My life has meaning! My testing just saved users from this nightmare!
- The SUT is finally working. Awesome! It’s so nice being part of a development team that can deliver quality software. I can finally stop testing it and move on. Our users are going to love this.
See? Either way it’s good news.
Or maybe I just spin it that way to love my job more. So be it. If you think your testing job is stressful, you may want to make a few adjustments in how you work. Read my You’re a Tester, Relax post.
During a recent exchange about the value of automated checks, someone rhetorically asked:
“Is automation about finding lots of bugs or triggering investigation?”
Well…the later, right?
- When an automated check passes consistently for months then suddenly fails, it’s an indication the system-under-test (SUT) probably unexpectedly changed. Investigate! The SUT change may not be directly related to the check but who cares, you can still pat the check on the back and say, “thank you automated check, for warning me about the SUT change”.
- When you design/code an automated check, you are learning how to interact with your SUT and investigating it. If there are bugs uncovered during the automated check design/coding, you report them now and assume the automated checks should happily PASS for the rest of their existence.
- If someone is organized enough to tell you the SUT is about to change, you should test the change and assess the impact on your automated checks and make necessary updates. Doing so requires investigating said SUT changes.
In conclusion, one can argue, even the lamest of automated checks can still provide value. Then again, one can argue most anything.
Perhaps Jimmy John’s should have hired some software testers before slapping their “Order Online” logo all over the place.
Yesterday, while entering a group order online, I had a little trouble typing my Delivery Instructions in the “memo-size” text box.
The only way to add a drink to your sandwich order was to add a second sandwich. Um, I only want one sandwich.
I selected the earliest delivery time available:
However, after painstakingly collecting the orders of about 17 people, when I submitted my group order, Jimmy John’s showed me this user validation message:
And prior to me clicking the OK button, Jimmy John’s cancelled my entire order, including sending this email to all 17 people :
There is some additional context here that is too complex for this post. Suffice it to say, I was irritated that my order was cancelled with no warning.
I called the Jimmy John’s phone number provided on the top of my screen and the dude said, “we have no idea how to retrieve your order, we just get a printout when it’s submitted”.
In the end, the good people at my Jimmy John’s franchise accepted a fax with screen captures of my original group order (we tried email, but they couldn’t retrieve it) and they delivered the order flawlessly.
Is there a name for this? If not, I’m going to call it a “fire drill test”.
- A fire drill test would typically not be automated because it will probably only be used once.
- A fire drill test informs product design so it may be worth executing early.
- A fire drill test might be a good test candidate to delegated to a project team programmer.
Fire drill test examples:
- Our product ingests files from an ftp site daily. What if the files are not available for three days? Can our product catch up gracefully?
- Our product outputs a file to a shared directory. What if someone removes write permission to the shared directory for our product?
- Our product uses a nightly job to process data? If the nightly job fails due to off-hour server maintenance, how will we know? How will we recover?
- Our product displays data from an external web service. What happens if the web service is down?
Too often, us testers have so much functional testing to do, we overlook the non-functional testing or save it for the end. If we give these non-functional tests a catchy name like “Fire Drill Test”, maybe it will help us remember them during test brainstorming.