Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Parallelizing test automation? Read this first.

public://pictures/paul_headshot_2018.jpg
Paul Grizzaffi Principal Automation Architect, Magenic
 

You've done your due diligence. You've reviewed your automation. You've responsibly removed or refactored appropriate components. But despite these efforts, your automation takes longer to execute than you can tolerate; your team just can't wait that long for feedback.

Fortunately, there’s hope: You'll just parallelize your automation runs, right? Not so fast.

To borrow from a popular Internet meme, one does not simply walk into test automation and parallelize. As with most things, especially software things, there are many considerations to address beforehand. Here are a few of them.

Concurrent processes and thread safety

Threading is a language feature that allows multiple streams of execution within a single process. These streams are called "threads," and they typically share intra-process resources such as memory. This sharing is a powerful capability, but it can also be problematic if your code is not thread-safe.

To develop code that's thread-safe, you must be aware of the potential interactions among threads and write your code to protect against unintended interactions. Part of the solution is to choose data structures that are documented as thread-safe, as using data structures that are not thread-safe in a threaded environment can lead to intermittent issues in your code.

Another way to implement parallelization is to execute concurrent processes on one or more computers. When you use this implementation, you needn't worry about unintended interactions within the individual processes, because you're not sharing in-process resources across processes.

As with threaded processes, however, you do need to protect against unintended interactions among externally shared resources such as files, databases, and access to ancillary programs.

Though programming for concurrent processes can be easier than handling threading, concurrent processes use more system resources such as memory and processor cycles than threads do. And the sharing of resources and synchronization of work can be less resource-intensive when you use threading.

The core message here is this: Be aware of these concurrency considerations, and account for them. A deep dive on concurrency is out of scope for this article, but the Internet has lots of information about concurrency and threading. I recommend both of this is a general explainer on thread safety on Wikipedia, and this deeper dive from the Massachusetts Institute of Technology.

[ Special Coverage: STAREAST Conference 2019 ]

You'll need unique credentials

Now that you've handled concurrency, you’re home free, right? Not yet. When you run your automations in parallel, you have multiple accesses or connections to your system under test. But systems don't always allow multiple logins with the same set of credentials.

If you run in parallel using the same credentials for each parallel test script, the first-executed test script usually connects correctly, but the remaining test scripts may fail to log in because the credentials are already in use. Worse still, the subsequent script logins might kick out the previous login. That's bad, because it's harder to debug the failure cause.

The worst case I've seen was a system that allowed the same user credentials to log in multiple times, but the all the logins shared an "idle timeout." That meant when the first test script was complete, the idle timer started; when the idle timer expired, all subsequent transactions on that set of credentials received the wonderful error message "Session terminated due to idle timeout."

Know what you're up against

When deciding to parallelize automation, make sure you understand the requirements and limitations of your system under test. You’ll often need multiple sets of user credentials so that your test scripts can act independently and consistently.

For some teams, handling this is trivial; they can simply create system credentials at will. For other teams—especially those whose product login credentials are tied to another system or Active Directory credentials—the process for obtaining additional credentials can be more arduous.

I've worked in organizations where IT or security required that each set of credentials be associated with just one employee. In this scenario, your teams might benefit from my earlier blog post about automation's extended audience.

To be fair to IT and security organizations, having different sets of user credentials for test execution typically means that the IDs and passwords must be well known, at least within the teams using those credentials. And depending on how you create and manage the credentials, this approach might be considered a security risk.

Deal with data dependencies

At this point you may have sorted out the shared credentials problem, but you still can't simply execute parallel scripts with impunity. Why? Because your automation has data dependencies.

In many transaction-based or state-based applications, once you operate on a specific data item, it is then in a different state. If another test script (or another execution of the same test script) operates on that data item, the operation will eventually fail because the data is not in the expected state. Or it might just do the wrong thing, which is considerably worse.

To have parallelization that works consistently, state-based data must be in the correct state for each test script that acts upon it. This usually means creating or loading the appropriate data when the script executes; the curation of this data is what is typically called "test data management." That's usually far more involved than it sounds, but the activity must take place if you expect your scripts to deliver consistent results.

Create the correct execution environments

After having conquered all of the previous challenges, you'll need a place to run all of those test scripts. You could have multiple physical machines available to handle the parallelization. I’ve used a test lab like this when taking advantage of virtual machines (VMs) was deemed too expensive.

Nowadays, VMs can be a more economical option for automation parallelization. Whether those VMs are on premises or are cloud-based largely depends on the cost and your company's ability to embrace the cloud.

You could also work with cloud partners, companies that host cloud-based execution environments for testing and automation, including Selenium grids and mobile device farms. Depending on your situation, you may find that partnering with one of these providers is more economical than managing these environments yourself.

This economy can come from your organization not having to expend effort to build and maintain the automation infrastructure. Your cloud partner handles the build-out and maintenance; your organization pays it for access to its infrastructure.

Different companies have different models; you can calculate the cost for creating your own infrastructure and compare it to the different models offered by the cloud partners, choosing whichever approach is best for your organization.

Control resources via artificial constraints

Parallelization is a lot of work, but that's okay. You don't need the perfect parallelization implementation; your approach must only be good enough to provide value, meaning consistent execution with a higher throughput than you'd get by running all scripts sequentially.

I once worked in a company where our automation platform included an intricate resource management server. It handled the reservation of the resources, the prevention of deadlock when trying to reserve the required resources, and the queuing of test scripts that were waiting for resources to become available.

This was a complex piece of code from both a development and a maintenance standpoint, but it allowed scripts to be executed with little regard for the availability of testing resources. If the resources were available, the script executed; if the resources were not available, it waited. The resource manager reduced the planning overhead required when testing against expensive and scarce, but highly sharable, resources.

You can manage resources in another way

Most teams don't have a robust resource manager such as this and, frankly, most teams don't need it. You can accomplish a similar goal administratively.

I worked with a team where we evaluated which automated scripts could run in parallel and which had to run sequentially. We organized those that had to run sequentially into a single, sequential suite to avoid resource or data contention.

The other scripts—those that could run in parallel—were spread across execution machines so that they could run as soon as the previous script completed. In this way, we achieved the same throughput as we would have had with a resource management system, but with far less programmatic work.

It's worth the effort

Just remember to cover the important bases before you start doing any programming to parallelize your test automation. Be aware of the potential interactions among threads, and write your code to protect against unintended interactions.

Take care of any issues with multiple log-in credentials, deal with data dependencies, and make sure you have some means of managing test resources before you begin.

Yes, parallelization is a lot of work, but if you approach it responsibly and can extract value from it, the work is worth the gain in throughput.

Don't miss Paul Grizzaffi's presentation, "Well, That’s Random: Automated Fuzzy Browser Clicking," which looks at the benefits of using using random clicking in testing, at the STAREAST conference in Orlando. The conference runs April 28-May 3. TechBeacon readers can save $200 on registration fees by using promo code SECM.

Keep learning

Read more articles about: App Dev & TestingTesting