We all understand the benefits of automating testing of a software application. The key benefits are but not limited:
- Saves time.
- Frees up time for other deeper testing.
- Takes away monotonous and boring task of repetitive testing .
- Avoids human errors.
But for automated tests to be useful/successful they should:
- Be highly reliable. A test failure should depict a failure point in the system or a bug in the operation of the system under test.
- Should have short execution times.
- Provide fast feedback to code changes.
- Provide clear and precise test results.
We know the benefits, but how do we achieve them? How do quantify the benefits to others? I will describe how we here at Godaddy:
- Measure the reliability of our automated tests.
- Measure the execution times of our tests.
- Create useful test result reports.
In this blog, I will explain how and what automation metrics we are measuring these here at GoDaddy. We are currently using Ruby, RSpec, Selenium Webdriver and few other gems to automate our test cases. We use ELK (Elastic Search, Log stash and Kibana) for collecting, storing and displaying data related to the automation. Although there are other products within the company that use this framework, here I will take the example of Website Builder as the application under test.
One of the easiest ways of measuring reliability of tests is to see the number of times a test fails over multiple builds of the application. Below is a screenshot from our dashboard showing the Top 15 failing tests in last 24 hours against 12 different builds:
UI Automation is fragile by nature especially if you have 100s of long end to end tests. In our current framework, with each build of the application under test, we re-run only the failed tests as part of the initial run of automation. By re-running only the failed tests right away we improve the tests’ fragility, reduces false failures and hence increases the reliability of the results.
To explain this using the above screenshot we can see that, WSB-8455, Form test failed 12 times even after being re-run after the initial failure. This data requires us to investigate further to confirm if the failure is a bug or the test is flaky.
Another important automation metric to measure is the execution times of the tests. We also send execution times in seconds to ELK for each test after each run. Here is screenshot showing mean run times for our tests in ascending order:
We use the mean time because it more accurately identifies the slowest tests. It accounts for outliers possibly due to slow environment or other environmental issues.
We then profile individual methods to identify which methods take the longest to execute. Refactoring these slower methods, if possible, can improve our overall execution times. Here is a screenshot showing execution times of several methods in seconds:
Reporting test results
We also capture the exceptions every time a test fails. Hence allowing us to understand top exceptions causing failures:
After working with exceptions for sometime one can easily and quickly identify the reason for the failure. As an example, the timeout signifies either the system failed to respond or an element could not be found. That being said, we have implemented a mechanism to show the generated report using the allure-rspec gem. Here is screenshot of the report that gets generated:
In the above screenshot, you can see the complete stack trace of the error/exception that happened and as well as screenshot at the time the exception happened. This greatly helps troubleshotting the cause of the failure right away rather than needing to run the test manually again or debug the test locally.
- By measuring how many times a test fails even after immediate re-run we are able to measure the reliability of each test.
- Capturing mean execution times of each test and also measuring individual method execution times helps us improve the speed of our automation suite.
- Using allure-rspec gem, we are able to create a clear report which shows a screenshot in line with the exception resulting in faster analysis of the run.
This data driven decision-making mindset, has removed the ambiguity of not understanding the reliability or analyze automations test results. Creating this dashboard gives complete visibility and easy access to anyone interested in looking at the test results. Using the data from our tests we have accomplished the following:
- Figure out and fix all the flaky tests in our regression suite.
- Measure the mean execution times of our jobs and take appropriate actions including but not limited to upgrading selenium webdriver, chrome driver etc. to reduce the execution times.
- Understand top slowest methods and refactoring them to reduce the execution times of our tests.