When testing an application, because of slight differences between the test environment or usage pattern and the real system, we often end up discovering "bugs" that would never happen under normal conditions. These bugs tend to be surprising because we wonder how the problem could have escaped our noticed for so long or how it could have been introduced. Here are two examples of these bugs, followed by an explanation of how the "artifact" was created, identified, and resolved:
- I was working on a system where users had a foldering structure stored on the server. We were testing server performance, and were simulating a large number of users creating lots of folders over a long period of time. Things were getting slower and slower and it looked like there was a serious performance problem.
- Recently, I was working on porting an application from an older version of WebLogic to a newer version (9.2). We have a load testing rig that simulated the effects of many users calling the system over time. Everything was going smoothly with the port until we started ramping up our testing with the simulated clients. Each client was using SSL to connect and a certificate to authenticate, and the server should be keeping track of the who was authenticated for a given call so that their actions can be associated with them. Our test rig relied on multiple simulated users connecting from a single physical machine (a fairly standard practice for load-testing), and when we tried it with the updated version, suddenly calls coming from the same machine were seeming to have somewhat arbitrary credentials associated with them, as if the server code was not thread-safe and the authentication-related code was totally broken.
Now, the thrilling conclusion:
- We had to first cut apart the size of the data being created from other factors, such as length of time since test initiation (since data sizes tend to grow as time goes on). When we went and looked at the actual data being created, we noticed that we hadn't set any limits on how many subfolders should be created for a given folder, with some folders winding up with 1000s of child folders, something that was deemed very unlikely to happen in practice (and in fact it never has). We made a note of the fact that a performance problem could arise if a user chose to create a huge number of child folders, and changed our test rig to create deeper folder nesting rather than wider folder nesting keeping the number of folders the same while avoiding an unlikely usage pattern.
- While the original theory was that we had somehow failed to port our code to the new WebLogic verson correctly, this simply caused us to chase down a lot of dead ends. We decided to start running only one client per physical machine to see if the problem appeared (after putting in lots of extra logging on the server and writing a very simple, repeatable test to demonstrate the problem). The problem disappeared in the multiple machine test, and it became clear that the issue was related to running multiple clients on the same machine. At this point we were tired of dealing with the issue and accepted this workaround, since in practice, we never had a situation where multiple clients would be connecting from the same machine and authenticating as different users. We still don't know if WebLogic somehow associates credentials with a particular IP address, and if so, if there is some way to turn this off. To really verify this theory we would need to set of a machine with multiple IP addresses assigned to the same NIC, and somehow get different clients to use different IP addresses.
What's the moral of the story?
- When you uncover a bug during testing that surprises you in that you would have expected to see it under production conditions, go back and verify that you are actually trying to do something that the production system does.
- In the case where the bug is something that hinders your ability to test, but would have no effect on the actual system (as in bug #2 above), it can be a very tough call to determine how much energy to put into fixing it.
- While testing for a broad range of conditions and situations can be beneficial for a system, especially in case where you might anticipate a future problem (as in bug #1), you can also wind up plugging a lot of holes that won't ever leak.
I think my favorite artifact story is the one retold by Steve McConnell about a team trying to get better performance out of their OS using some profiler data:
Bentley also reports the case of a team that discovered that half an operating system's time was spent in a small loop. They rewrote the loop in microcode and made the loop 10 times faster, but it didn't change the system's performance-they had rewritten the system's idle loop.
A method eating up 50% of the execution time sure looks like a nasty bug, but it was only an artifact of the system design. Keep this lesson in mind next time you see something so shocking.
