How to do a Failover or Session Replication Testing?


Failover Testing – Nowadays we have applications which try to provide services to the customers 
24 X 7 unless and until they have planned maintenance window’s. Example for this is any banking application. Moreover, clustered servers are deployed these days to ensure the performance, availability and scalability of the applications are improved.
So in Failover testing we will be testing the availability of the application when one of the servers goes down in the cluster. Or in other words we can check how the other online servers are handling the load when few of the servers in the cluster goes offline.  

This failover testing is also known as session replication testing. Session replication is a mechanism used to replicate the data stored in a session across different instances. However, the replicated instance must be part of the same cluster. When session replication is enabled in a cluster environment, the entire session data is copied on a replicated instance. So when one instance goes offline the user will be able to use the same session without any interruption because of its replication. This is to ensure that business is carried out smoothly.
Failover or session replication testing can be done on both Application & Database servers. Usually failover testing is done with the peak load.

Approach:

Failover Table
In the above table we can see that there are 2 server instances. We have to divide the duration of the test into time slots to perform this.
Phase 1 is the Ramp up period, where all the users are brought into the system and they take some time to reach the steady state. In this phase the load will be balanced between 2 server instances.
Phase 2 is where the first failover begins. In our case we make the Server 1 to go offline. Now the CPU utilization of Server 2 will increase since it takes the additional load from Server 1.
Phase 3 we will bring back the Server 1 to online. So again the load should be shared among the server instances. Meaning Server 1 CPU utilization will start to grow and Server 1 CPU utilization will start to reduce and they will reach a balancing point. We need to measure the recovery time of Server 1
Phase 4 is where the second failover begins. In our case we make the Server 2 to go offline. Now the CPU utilization of Server 1 will increase since it takes the additional load from Server 2.
Phase 5 is same as that of Phase 3.
Failover CPU Utilization - Sample
Test Report:

Below are the metrics to be captured in a Failover test report.
  • Server recovery time in Phase 3 & 5.
  • CPU & Memory utilization of both the servers in each Phase.
  • Response time of all the transactions in each phase.

No comments:

Post a Comment