Current approach to GUI test automation

From an IT security point of view, the current approach to GUI test automation is careless or even dangerous. And here is why…

A general principle in IT security is to forbid everything and only allow what is really needed. This reduces your attack surface and with it the number of problems you can encounter. For most situations (e.g. when configuring a firewall), this means to apply a whitelist: forbid everything and allow only individual, listed exceptions. And make sure to review and document them.

Compare this to the current state of the art of test automation of software GUIs. With tools like Selenium — the quasi-standard in web test automation — it is the other way around. These tools allow every change in the software under test (SUT) unless you manually create an explicit check. With regard to changes, this is a blacklisting approach. If you are familiar with software test automation, you know that this is for good reasons. It is because of both the brittleness of every such check and the maintenance effort it brings about. But apart from why it is that way, does it make sense? After all, false negatives (missing checks) will decay trust in your test automation.

To be defensive would mean to check everything and only allow individual and documented exceptions. Every other change to the software should be highlighted and reviewed. This is comparable to the “track changes” mode in Word or version control as used in source code. And it is the only way to not miss the dancing pony on the screen, that you didn’t create a check for. At the end of the day, this is what automated tests are for: to find regressions.

There are two important considerations when choosing between the two approaches:

  1. How do you reach that middle ground in the most effective way?
  2. What “side” is less risky to approach than if the perfect spot is missed?

IT security guidelines recommend to err on the side of caution. So in case, both approaches create an equal amount of effort, you should choose whitelisting. But, of course, you usually don’t have equal amounts of effort.

A real-life example

Imagine you have a software that features a table. In your GUI test, you should put a check for every column of every row. With seven columns and rows, this would mean 49 checks — just for the table. And if any of the displayed data ever changes, you have to copy & paste the changes manually to adjust the checks.



Starting with a whitelisting approach, the complete table is checked per default. You then only need to exclude volatile data or components (typically build-number or current date and time). And if the data ever changes, maintaining the test is way easier, because you usually (depending on the tool) have efficient ways to update the checks. Guess which of the two approaches is less effort…

Text-based vs pixel-based whitelist tests

There are already tools out there that let you create whitelist tests. Some are purely visual/pixel-based, such as PDiffApplitools and the like. This approach comes with its benefits and drawbacks. It is universally applicable — no matter if you check a web site or a PDF document. But on the other hand, if the same change appears multiple times, it is hard to treat it with one go. Whitelisting of changes (i.e. excluding parts of the image) can be a problem.

Approval Test and TextTest are text-based tools that are much more robust. But PDFs, web sites, or software GUIs have to be converted to plain text or images for comparison. Ignoring changes is usually done via regular expressions.