3 useful Automation Metrics – What and How?

We all understand the benefits of automating testing of a software application. The key benefits are but not limited:

  • Saves time.
  • Frees up time for other deeper testing.
  • Takes away monotonous and boring task of repetitive testing .
  • Avoids human errors.

Automation metrics

But for automated tests to be useful/successful they should:

  • Be highly reliable. A test failure should depict a failure point in the system or a bug in the operation of the system under test.
  • Should have short execution times.
  • Provide fast feedback to code changes.
  • Provide clear and precise test results.

We know the benefits, but how do we achieve them? How do quantify the benefits to others? I will describe how we here at Godaddy:

  • Measure the reliability of our automated tests.
  • Measure the execution times of our tests.
  • Create useful test result reports.

In this blog, I will explain how and what automation metrics we are measuring these here at GoDaddy. We are currently using Ruby, RSpec, Selenium Webdriver and few other gems to automate our test cases. We use ELK (Elastic Search, Log stash and Kibana) for collecting, storing and displaying data related to the automation. Although there are other products within the company that use this framework, here I will take the example of Website Builder as the application under test.


One of the easiest ways of measuring reliability of tests is to see the number of times a test fails over multiple builds of the application. Below is a screenshot from our dashboard showing the Top 15 failing tests in last 24 hours against 12 different builds:

UI Automation is fragile by nature especially if you have 100s of long end to end tests. In our current framework, with each build of the application under test, we re-run only the failed tests as part of the initial run of automation. By re-running only the failed tests right away we improve the tests’ fragility, reduces false failures and hence increases the reliability of the results.

To explain this using the above screenshot we can see that, WSB-8455, Form test failed 12 times even after being re-run after the initial failure. This data requires us to investigate further to confirm if the failure is a bug or the test is flaky.

Execution times

Another important automation metric to measure is the execution times of the tests. We also send execution times in seconds to ELK for each test after each run. Here is screenshot showing mean run times for our tests in ascending order:

We use the mean time because it more accurately identifies the slowest tests. It accounts for outliers possibly due to slow environment or other environmental issues.

We  then profile individual methods to identify which methods take the longest to execute. Refactoring these slower methods, if possible, can improve our overall execution times. Here is a screenshot showing execution times of  several methods in seconds:

Slowest Methods to execute

Reporting test results

We also capture the exceptions every time a test fails. Hence allowing us to understand top exceptions causing failures:

Top exceptions in tests

After working with exceptions for sometime one can easily and quickly identify the reason for the failure. As an example, the timeout signifies either the system failed to respond or an element could not be found. That being said, we have implemented a mechanism to show the generated report using the allure-rspec gem. Here is screenshot of the report that gets generated:

Report with stack trace and screenshot

In the above screenshot, you can see the complete stack trace of the error/exception that happened and as well as screenshot at the time the exception happened. This greatly helps troubleshotting the cause of the failure right away rather than needing to run the test manually again or debug the test locally.


  • By measuring how many times a test fails even after immediate re-run we are able to measure the reliability of each test.
  • Capturing mean execution times of each test and also measuring individual method execution times helps us improve the speed of our automation suite.
  • Using allure-rspec gem, we are able to create a clear report which shows a screenshot in line with the exception resulting in faster analysis of the run.

This data driven decision-making mindset, has removed the ambiguity of not understanding the reliability or analyze automations test results. Creating this dashboard gives complete visibility and easy access to anyone interested in looking at the test results. Using the data from our tests we have accomplished the following:

  • Figure out and fix all the flaky tests in our regression suite.
  • Measure the mean execution times of our jobs and take appropriate actions including but not limited to upgrading selenium webdriver, chrome driver etc. to reduce the execution times.
  • Understand top slowest methods and refactoring them to reduce the execution times of our tests.

Mobile App Testing

The successful companies will look at current mobile market as something different and rethink their product strategy, taking a mobile first perspective. This begins by looking at the target market, the intention of the app and the risk profile. Plus, if anything, we are seeing mobile applications which have more features or more functionality than the web equivalent. We need to test at least the same as that web equivalent and usually more.

For example, an airline application requires check-in abilities on the web. But on the mobile application, you also need to have ticket presentation ability, baggage tracking and many more functions. So the mobile app testing for such an app would be different for an app from eBay.

How is Web Testing different than Mobile App Testing

The complexity of test has increased making the test matrix exponentially larger than it was before, specifically in mobile. For example, right now on the market, there are some 12,000 unique Android devices alone. The fragmented mobile market has created huge challenges for app developers.
They need to understand these trade offs in relation to the time-to-market goals and the quality tolerances of the consumer-base.

You cannot test it all because the number of different vectors that you need to cover with mobile are much greater than in traditional testing if you want to keep up with the market.

High quality no longer means merely the absence of bugs. Mobile apps are now measured against a user-centric, more 360 degree view of app quality, which from our perspective includes factoring in usability, performance, content and functional quality,as well as security. If an app fails to meet the expectations of users, it can often be detrimental to the entire brand.

Are there any industry specific examples that demonstrate a different focus on testing?

Many companies are trying to get new features out in the market. One of their marketing techniques is to provide free applications to consumers. They may sacrifice inconsequential bugs to be able to launch a new feature. If a bug is not going to impact significant number of people, they may or may not fix it.

However, certain industries, like banking, really cannot have any functional bugs that make consumers think they can’t be trusted. If there are security or even minor functionality flaws, it could lead to poor perceptions and damage that trust. Remember though, that the drive in the banking industry is for mobile applications.

This is supported by an Alexis Partners study that suggested nearly half of smart phone users that switched banks said that mobile banking was an important factor in their decision. That’s up from only 7% in 2010. Financial institutions feel they have to add new features and capabilities to differentiate themselves from other providers, but at the same time, they have to maintain an extremely high level of quality and security in their applications.

Keeping some of these factors into consideration, here are some of the things that you need to should consider when creating a test strategy for a mobile app:

Functional Testing

  • Validation of Functionality
  • Smoke/Regression Testing
  • Orientation changes
  • Network Access testing
  • Negative/Boundary condition Testing
Non-Functional Testing

  • Different Network Strength / Outage / Recovery testing
  • Different network types e.g. Wi-Fi, 3G, 4G etc.
  • Different hardware, peripheral testing.
  • Stopping activities and not killing the app.
Interoperability or Multi tasking

  • Voice/SMS interrupts
  • Notifications
  • Battery/Cable removal
  • Sending app in background/ Multi-tasking
Memory Profiling

  • Memory usage under different tests
  • Memory leaks after long usages or other situations like multi tasking.
Performance testing

  • Device CPU usage testing.
  • Network data usage
  • Page render times or Activity render times
Installation testing

  • New App install
  • Uninstall and reinstall
  • Upgrade
Usability testing

  • User experience
  • Expert reviews
  • Competitive analysis
Security testing

  • OWASP vulnerabilities
  • Dynamic testing
  • Static code analysis
  • Data encryption
  • Penetration testing
International-device support testing

  • Different locales
  • Different currencies, time zones
  • Images and text

Some of the areas may be more relevant than others based on the type of application under test. Obviously, being able to test everything on all devices can be a daunting task. So that’s where you would want to automate a lot of different areas.


It’s a key to any testing strategy. But the questions are:

  • How do you create efficient testing automation which can run on multiple devices?
  • How do you create efficient automation which uses a minimal amount of changes between your versions?
  • How do you create automation which can be truly unattended for multiple devices?

The answers to these questions are very important when it comes to mobile. Practically speaking, they determine the level of efficiency that can be derived from your testing efforts and the coverage reach you hope to achieve.

In my future post, I would talk about different options available to automate!

Building an Automation framework

There is a lot of information and opinions available on building an automation framework if you search online. This blog is an attempt to express my view from all the information available out there and some of the additional information that I have figured out and worked on.

Here are some of the challenges associated with testing a product that is suppose to work on different platforms like web, natively on desktop and on Mobile devices. I have also written a separate blog for “What to test in a cross-platform application?” As a benchmark, I will consider Skype as the application under test.

  • OS combinations: An application that needs to work on different flavors of each OS type that it supports. Imagine an application that is suppose to work on Web, Windows and Mac, and needs to support at least the 3 versions of each OS type (based on current OS usage as of Dec 2013). Permutation and combination of each of these 3 results in 72, which means you need to run each test 72 times. Obviously you can rule out some combinations based on usage, application design extra. Now add in Mobile devices and OSes to this mix and the matrix grows exponentially.
  • Installer/Setup Differences: Many a times, software testers find themselves spending a lot of time just setting up for the tests. Different platforms require the use of native package formats such as RPM and MSI. Multi-platform installers such as InstallAnywhere, JExpress, InstallBuilder, or IzPack address this need.
  • Feature Parity – Developers are often restricted to using the lowest common denominator subset of features, which are available on all platforms. This may hinder the application’s performance or prohibit developers from using platforms’ most advanced features.
  • Hardware Differences – If your application uses any of the system resources like audio, video etc. The different hardware combinations add to the complexity of testing.
  • UI Differences – Different platforms often have different user interface conventions, which cross-platform applications do not always accommodate. For example, applications developed for Mac OS X and GNOME are supposed to place the most important button on the right-hand side of a window or dialog, whereas Microsoft Windows and KDE have the opposite convention. Though many of these differences are subtle, a cross-platform application, which does not conform appropriately to these conventions may feel clunky or alien to the user. When working quickly, such opposing conventions may even result in data loss such as in a dialog box confirming whether the user wants to save or discard changes to a file.
  • Security concerns: Cross-platform execution environments may suffer cross-platform security flaws thus creating a fertile environment for cross-platform malware.

To overcome some of these challenges, we need automated tests to rescue. Here is the testing pyramid that you should aim for when creating automated tests.Pyramid

Testing Pyramid.jpg

The pyramid above allows you to find bugs more easily and quickly as you are able to detect issues early on with more code coverage through Unit tests and component tests running in CI. Also this pyramid allows you reduce cost of fixing each bug. Any bug found higher up in the pyramid is difficult to fix, debug and consumes a lot of time.

Keeping the above pyramid in mind, here are the things that you would need to consider when building an automation framework for an application like Skype:

  • Application Development technology: This is important for being able to create and write unit tests for the application under test. Also this would determine how many people would be able to contribute to the framework and tests.
  • Frequency of changes in functionality or UI: This would obviously be dependent on the stage of your application. If your application is in its initial stages of development then the functionality and UI may be changing a lot to meet evolving customer needs.
  • Future changes: Is the application to be in a maintenance mode, rewritten or modified significantly in the near future.
  • Powerful vs easy to write tests cases: This would also determined by the technical capabilities of the team members and the velocity with which newer tests need to be added.

Automation framework vs Automation tool

A lot of people confuse automation framework with an automation tool. There are several companies who offer different tools that can be part of a framework to be used to achieve the desired level of automated tests. It would be fair to say:

  • TestComplete, Eggplant are tools that can be used in a framework to achieve the desired results.
  • Selenium, Web Driver, Appium, Robotium are tools/libraries that are used with in a framework to achieve certain automation from the above pyramid.
  • TestNG, Cucumber, JUnit, RSpec are test runners that are used within a framework to be able to organize and execute tests within the framework.

Considering all the factors mentioned above, define the technology stack (Ruby, .NET, Win32, Java, Web/HTML, Web toolkits [GWT], etc.) that can be used for testing your application. You can also decide to use different technology for different platforms and still be able to use them within same framework using wrapper classes as long as they can send and receive response in some common protocol like JSON. I will share specific details about this mechanism in a later blog.

Building the framework

While building an automation framework, consider putting in place a multi-layered approach so that each layer can be built, maintained and grown independent of each other. Here are different components/layers to build out before defining a framework:

  • First build out the ‘common’ utility functions/methods comprising of:
    • Setup/TearDown – Start out with the most used platform for your application. The setup should include starting required services including database connections, etc. The teardown should take care of cleaning up all the test specific things. You always want to start clean for each test so your setup may also contain certain tear down methods.
    • Reporting mechanism: The test framework would largely cover this that you use but being able to establish utilities on top of that allow you clearing define and debug your code when you are executing. I will soon write up a separate blog for reporting.
  • Next build things for the basic technology layer:
    • Objects that would be used in all the tests e.g. web driver.
    • Define test data definition using data driven model keyword driven, and/or be able to read from properties file
    • Build the capability to call different types of API clients e.g. HTTP client for REST API testing.
  • Third, build out the extrapolation layer for the ‘Components’ that use the two previous layers to construct the tests.
    • The components in combination with the keywords and other technology functions/methods give you a flexible way to implement both functional and data driven tests.

The key here is to be able to extend/push the limits of one or two tools (or more) to accomplish the end goal. By separating out the layers and making them flexible you can have a unified/core level/framework that other layers can utilize and then they can be custom to the type of system you’re working with. Build out the framework in such a way that its tool independent and the current tools can be replaced with new ones if they are better and solve more problems in the future. Forward thinking from the product perspective i.e. if the product is going to be adding features/functionality that may be crucial for the business than we need to be prepared for that from framework and automation coverage. In my next blog I will share more detailed information on creating a framework for cross platform product like skype etc.