Failover Testing
– Nowadays we have applications which try to provide services to the customers
24 X 7 unless and until they have planned maintenance window’s. Example for
this is any banking application. Moreover, clustered servers are deployed these
days to ensure the performance, availability and scalability of the
applications are improved.
So in Failover testing we will be testing the availability
of the application when one of the servers goes down in the cluster. Or in
other words we can check how the other online servers are handling the load
when few of the servers in the cluster goes offline.
This failover testing
is also known as session replication testing. Session replication is a
mechanism used to replicate the data stored in a session across
different instances. However, the replicated instance must be part of
the same cluster. When session replication is enabled in a cluster
environment, the entire session data is copied on
a replicated instance. So when one instance goes offline the user
will be able to use the same session without any interruption because of its
replication. This is to ensure that business is carried out smoothly.
Failover or session replication testing can be done on both
Application & Database servers. Usually failover testing is done with the
peak load.
Approach:
Failover Table |
In the above table we can see that there are 2 server
instances. We have to divide the duration of the test into time slots to
perform this.
Phase 1 is the
Ramp up period, where all the users are brought into the system and they take
some time to reach the steady state. In this phase the load will be balanced
between 2 server instances.
Phase 2 is where
the first failover begins. In our case we make the Server 1 to go offline. Now
the CPU utilization of Server 2 will increase since it takes the additional
load from Server 1.
Phase 3 we will
bring back the Server 1 to online. So again the load should be shared among the
server instances. Meaning Server 1 CPU utilization will start to grow and
Server 1 CPU utilization will start to reduce and they will reach a balancing
point. We need to measure the recovery time of Server 1
Phase 4 is where
the second failover begins. In our case we make the Server 2 to go offline. Now
the CPU utilization of Server 1 will increase since it takes the additional
load from Server 2.
Phase 5 is same
as that of Phase 3.
Failover CPU Utilization - Sample |
Test Report:
Below are the metrics to be captured in a Failover test
report.
- Server recovery time in Phase 3 & 5.
- CPU & Memory utilization of both the servers in each Phase.
- Response time of all the transactions in each phase.
No comments:
Post a Comment