niedziela, 11 września 2011

Linux: Configuring cluster VIP (Virtual IP) with keepalived: Part 3

The first part of this article is focusing on the configuration of keepalived, the second part is focusing on two test scenarios - node failure and recovery. In this part I would like to discuss the option that was added recently to keepalived namely monitoring the network interface status (meaning if the VIP interface is down there should a failover happen) and show which enhancements are required in order to enable it.
Let me first remind you the environment configuration- Figure 1.

Figure 1 Network deployment of the test environment for
the keepalived's VIP failover

Case C - VIP network interface status monitoring - interface down

What we would like to have is the failover to the BACKUP keepalived node in case something happens to the network interface to which the VIP address is assigned. This feature is supported by the keepalived 1.2.2 but required an enhancement of the configuration file:

[root@centOS-hostA ~]# cat /etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    track_interface {
        eth1
    }
    virtual_ipaddress {
        192.168.122.50/24 brd 192.168.122.255 dev eth1 label eth1:0
    }
}

and the same for the BACKUP node (HostB):

[root@centOS-hostB ~]# cat /etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 101
    track_interface {
        eth1
    }
    virtual_ipaddress {
        192.168.122.50/24 brd 192.168.122.255 dev eth1 label eth1:0
    }
}


Where the eth1 is the interface to which the VIP address is assigned to, eth0 stands for the interconnect.
Test methodology:
  1. We constantly ping the VIP address from the client station
  2. We simulate a network interface error (we shut it down)
  3. We check ping output if there was a smooth failover
  4. We check the ARP table which MAC is used
  5. We sniff the interface to get the information what caused the client's machine ARP table update
  6. Check the HostA's log files
Let's first do a snapshot of the client's station before issuing any additional steps - the entry in the ARP maps the VIP address to the HostA's MAC:


krychu@krystianek:~$ arp -na
...

? (192.168.122.50) at 52:54:00:03:ba:f2 [ether] on virbr0
...
krychu@krystianek:~$

Now the results. After issuing the first two steps (the interface failure was simulated by ifconfig eth1 down) the keepalived on hostA has changed its status to FAILED and as expected the VIP address has been failed over. However this time one can notice that some ICMP echo replies are lost:






krychu@krystianek:~$ ping 192.168.122.50
PING 192.168.122.50 (192.168.122.50) 56(84) bytes of data.
...

64 bytes from 192.168.122.50: icmp_req=5 ttl=64 time=0.411 ms
64 bytes from 192.168.122.50: icmp_req=6 ttl=64 time=0.376 ms
64 bytes from 192.168.122.50: icmp_req=7 ttl=64 time=0.387 ms
64 bytes from 192.168.122.50: icmp_req=8 ttl=64 time=0.411 ms
64 bytes from 192.168.122.50: icmp_req=13 ttl=64 time=0.532 ms
64 bytes from 192.168.122.50: icmp_req=14 ttl=64 time=0.324 ms
64 bytes from 192.168.122.50: icmp_req=15 ttl=64 time=0.444 ms
...
^C
--- 192.168.122.50 ping statistics ---
22 packets transmitted, 18 received, 18% packet loss, time 20997ms
rtt min/avg/max/mdev = 0.295/0.408/0.683/0.088 ms
krychu@krystianek:~$ arp -na
...
? (192.168.122.50) at 52:54:00:56:c3:f4 [ether] on virbr0
...

The information about transition can be found in the logs:
Sep 11 19:09:45 centOS-hostA Keepalived: Starting VRRP child process, pid=3411
Sep 11 19:09:45 centOS-hostA Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Sep 11 19:09:45 centOS-hostA Keepalived_vrrp: Configuration is using : 63057 Bytes
Sep 11 19:09:45 centOS-hostA Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Sep 11 19:09:45 centOS-hostA Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(10,11)]
Sep 11 19:09:46 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 19:09:52 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: Kernel is reporting: interface eth1 DOWN
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering FAULT STATE
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Now in FAULT state
 

Case D - VIP network interface status monitoring - interface up


Now let's simulate that the network interface on HostA is brought back (by executing the ifconfig eth1 up).

As one can see there is nothing to be noticed in the ICMP echo replies, also client's station's ARP table entry remained unchanged.


krychu@krystianek:~$ ping 192.168.122.50
PING 192.168.122.50 (192.168.122.50) 56(84) bytes of data.
64 bytes from 192.168.122.50: icmp_req=1 ttl=64 time=3.57 ms
64 bytes from 192.168.122.50: icmp_req=2 ttl=64 time=0.446 ms
64 bytes from 192.168.122.50: icmp_req=3 ttl=64 time=0.387 ms
64 bytes from 192.168.122.50: icmp_req=4 ttl=64 time=0.374 ms
64 bytes from 192.168.122.50: icmp_req=5 ttl=64 time=0.395 ms
64 bytes from 192.168.122.50: icmp_req=6 ttl=64 time=0.368 ms
64 bytes from 192.168.122.50: icmp_req=7 ttl=64 time=0.571 ms
64 bytes from 192.168.122.50: icmp_req=8 ttl=64 time=0.288 ms
64 bytes from 192.168.122.50: icmp_req=9 ttl=64 time=0.380 ms
64 bytes from 192.168.122.50: icmp_req=10 ttl=64 time=0.371 ms
64 bytes from 192.168.122.50: icmp_req=11 ttl=64 time=0.353 ms
64 bytes from 192.168.122.50: icmp_req=12 ttl=64 time=0.359 ms
64 bytes from 192.168.122.50: icmp_req=13 ttl=64 time=0.387 ms
64 bytes from 192.168.122.50: icmp_req=14 ttl=64 time=0.367 ms
64 bytes from 192.168.122.50: icmp_req=15 ttl=64 time=0.366 ms
^C
--- 192.168.122.50 ping statistics ---
15 packets transmitted, 15 received, 0% packet loss, time 14000ms
rtt min/avg/max/mdev = 0.288/0.599/3.579/0.798 ms
krychu@krystianek:~$ arp -na
...
? (192.168.122.50) at 52:54:00:56:c3:f4 [ether] on virbr0
...
krychu@krystianek:~$

So what has happened? The answer can be found in the logs:

Sep 11 19:09:46 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 19:09:52 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: Kernel is reporting: interface eth1 DOWN
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering FAULT STATE
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Now in FAULT state
Sep 11 21:00:50 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE


The HostA's keepalived has changed the state from FAULTY to BACKUP and HostB remained the MASTER. It loks strange and not consistent with the node failure/recovery transitions but let's try to shutdown the HostB now and see if HostA will transistion into MASTER.

Sep 11 19:09:46 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 19:09:47 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 19:09:52 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: Kernel is reporting: interface eth1 DOWN
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering FAULT STATE
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 20:55:37 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Now in FAULT state
Sep 11 21:00:50 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 21:06:15 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 21:06:16 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 21:06:16 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 21:06:16 centOS-hostA Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50


As expected - after shutdown of HostB, HostA took over the role of MASTER and send the gratuitous ARP to update client station's ARP table. So the HA is provided but it still looks a bit inconsistent compared to the other scenarios 

 Summary

Although I did not check all scenarios related to the network problems - it seems that keepalived provides good support for monitoring the network interface. In the doc/samples/(...)track_interface file one can also find examples how to monitor multiple network interfaces.

Linux: Configuring cluster VIP (Virtual IP) with keepalived: Part 2

The first part of this article is focusing on the instructions/guide how to setup keepalived with VIP address for a two node configuration. I have enhanced the picture from the previous article with the ARP configuration since the ARPs will be used throughout this part to describe the mechanisms for VIP failover.

Figure 1 Deployment diagram of the keepalived for a two node configuration

In this part there will be two test scenarios described:
  1. The master goes down (Host A)
  2. The master starts up again (Host A)
Other cases involving Backup node (Host B) are not relevant to be described in detail since in these cases there is no VIP failover done.

Case A The master goes down (Host A)

Let's first check how the envrionment looks like from the client station perspective. Both nodes (HostA and HostB) are up and running with the keepalived as described in part 1 or this article. The client station is able to ping the VIP address:

krychu@krystianek:~$ ping 192.168.122.50
PING 192.168.122.50 (192.168.122.50) 56(84) bytes of data.
64 bytes from 192.168.122.50: icmp_req=1 ttl=64 time=0.394 ms
64 bytes from 192.168.122.50: icmp_req=2 ttl=64 time=0.353 ms
64 bytes from 192.168.122.50: icmp_req=3 ttl=64 time=0.270 ms
64 bytes from 192.168.122.50: icmp_req=4 ttl=64 time=0.814 ms
^C
--- 192.168.122.50 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2998ms
rtt min/avg/max/mdev = 0.270/0.457/0.814/0.212 ms

By looking into the arp table:

krychu@krystianek:~$ arp -an
...
? (192.168.122.50) at 52:54:00:03:ba:f2 [ether] on virbr0? (192.168.122.179) at on virbr0
...

we see that the Host A is the one available under the VIP (192.168.122.50) address, which is as we expected it to be since this is the master.
Now let's simulate a crash of the master node as follows:
  1. We constantly ping the VIP address from the client station
  2. We simulate a crash of the keepalived process on the master node (Host A), e.g. kill -9
  3. We check ping output if there was a smooth failover
  4. We check the ARP table which MAC is used
  5. We sniff the interface to get the information what caused the client's machine ARP table update
Ok, so what are the results? First of all the node crash does have effect on ICMP packets being dropped - there is a small increase in the response time to about 20.1 ms (please take a look below). 

krychu@krystianek:~$ ping 192.168.122.50
PING 192.168.122.50 (192.168.122.50) 56(84) bytes of data.
...
64 bytes from 192.168.122.50: icmp_req=37 ttl=64 time=0.336 ms
64 bytes from 192.168.122.50: icmp_req=38 ttl=64 time=0.398 ms
64 bytes from 192.168.122.50: icmp_req=39 ttl=64 time=0.273 ms
64 bytes from 192.168.122.50: icmp_req=40 ttl=64 time=0.298 ms
64 bytes from 192.168.122.50: icmp_req=41 ttl=64 time=20.1 ms
64 bytes from 192.168.122.50: icmp_req=42 ttl=64 time=0.320 ms
64 bytes from 192.168.122.50: icmp_req=43 ttl=64 time=0.379 ms
64 bytes from 192.168.122.50: icmp_req=44 ttl=64 time=0.314 ms
...
^C
--- 192.168.122.50 ping statistics ---
67 packets transmitted, 67 received, 0% packet loss, time 65997ms
rtt min/avg/max/mdev = 0.198/0.742/20.167/2.472 ms

The proof that the failover has happened can be found in the ARP table - which now has the assignment for the VIP address to the HostB's MAC (previousle Backup keepalived server).

krychu@krystianek:~$ arp -an
...
? (192.168.122.50) at 52:54:00:56:c3:f4 [ether] on virbr0
...

In the wireshark we see that the backup node (HostB) has sent the Gratuitous ARP message to a broadcast MAC announcing that he is the one owning now the VIP address. Afterwards the client station has updated it's ARP cache table.
If we check the keepalived logs on the HostB one can see that after the crash of the HostA has been detected the keepalived is transitioning from the BACKUP to MASTER state:

Sep 11 09:50:49 centOS-hostB Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Sep 11 09:50:49 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 09:50:49 centOS-hostB Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(10,11)]
Sep 11 10:55:04 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 10:55:05 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 10:55:05 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 10:55:05 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 10:55:10 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 10:55:45 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 11 10:55:45 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 10:55:45 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 10:58:35 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 10:58:36 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 10:58:36 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 10:58:36 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 10:58:41 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 11:32:40 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 11 11:32:40 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 11:32:40 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 11:39:54 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 11:39:55 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 11:39:55 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 11:39:55 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 11:40:00 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50


Case B The master is back (Host A)

Now let's focus on the situation when the Host A is brought back after a crash. Just as a reminder the situation is that after a crash HostB took over the Master role, owns the VIP address and the client station has an association to HostB's MAC in its ARP table.
The methodology is similar as in Case A:
  1. We constantly ping the VIP address from the client station
  2. We start the keepalived on the HostA
  3. We check ping output if there was a smooth failover
  4. We check the ARP table which MAC is used
  5. We sniff the interface to get the information what caused the client's machine ARP table update
After issuing step 1 and 2 we see again an increase in the response time in the ICMP traffic but much smaller than in case of the crash.

krychu@krystianek:~$ ping 192.168.122.50
PING 192.168.122.50 (192.168.122.50) 56(84) bytes of data.
...
64 bytes from 192.168.122.50: icmp_req=23 ttl=64 time=0.191 ms
64 bytes from 192.168.122.50: icmp_req=24 ttl=64 time=0.346 ms
64 bytes from 192.168.122.50: icmp_req=25 ttl=64 time=0.331 ms
64 bytes from 192.168.122.50: icmp_req=26 ttl=64 time=0.291 ms
64 bytes from 192.168.122.50: icmp_req=27 ttl=64 time=2.53 ms
64 bytes from 192.168.122.50: icmp_req=28 ttl=64 time=0.315 ms
64 bytes from 192.168.122.50: icmp_req=29 ttl=64 time=0.338 ms
...

However it is still not a proof that a failover actually has happened. Let's take a look at the client station's ARP table:

krychu@krystianek:~$ arp -an
...
? (192.168.122.50) at 52:54:00:03:ba:f2 [ether] on virbr0
...

We see that the original mapping (presented in case A), which was pointing to HostB's MAC now points to HostA's. Which indicates the VIP address is now owned by the HostA again. In the wireshark we again see the Gratuitous ARP message to a broadcast MAC announcing sent by the HostA. On this basis the client station has updated it's ARP cache table.

In the logs on HostB one can find the information that after startup of HostA keepalived on HostB is transitioning to BACKUP state since it has received an announcement with a higher priority (review the configuration files from part 1)

Sep 11 10:55:45 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 10:55:45 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 10:58:35 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 10:58:36 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 10:58:36 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 10:58:36 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 10:58:41 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 11:32:40 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 11 11:32:40 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 11:32:40 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 11 11:39:54 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 11 11:39:55 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 11 11:39:55 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 11 11:39:55 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 11:40:00 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.122.50
Sep 11 11:57:32 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 11 11:57:32 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 11 11:57:32 centOS-hostB Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol VIPs.

Summary

I hope that this exercise gave you a good understaning of the mechanisms controlling the VIP address failover in the keepalived and also some practical knowledge about how to setup such a test environment. For me this activity has been a great experience and fun. I think also it is a good base for further research - about the failure detection mechanisms (happening on the interconnect interface) and providing the application (e.g. HAProxy) on top of keepalived. 

Just to summarize the VIP address failover is controlled by a special ARP message (Gratuitous ARP) which has to be accepted and processed by the client station. By processing I mean that the client's station's ARP table mapping have to be updated accordingly.

From the software perspective - the keepalived daemon allows to configure the VIP failover groups and  assign MASTER, BACKUP roles for hosts within such a group. There is also a failure detection available which works fast (for a genuine keepalived configuration it was not higher than 20 ms) and controls the VIP address ownership.

Linux: Configuring cluster VIP (Virtual IP) with keepalived: Part 1

Some time ago I had a discussion about open source load balancing solutions (e.g. HAProxy)  especially with the focus on HA and VIP address failover (by VIP address failover we meant that the public IP is moved to the second node in case the first one is shutdown). It came to me that I have never in practice done that and I decided to built up such a solution - just to check the second point, namely the VIP address failover. For that I used my Ubuntu box together with two virtualized environments based on CentOS (these I had already available in kvm) - the configuration is shown on the picture below:

+
Figure 1 Network deployment of the test environment for
the keepalived's VIP failover
Ok, having the picture in mind let's get to work.

Test envrionment setup

The presented below points show how to configure it:
  1. Define additional Interconnect network. For doing that I used the virt-manager GUI: Edit->Connection Details and there should a window appear (just as the one presented on the Figure 2). In the Virtual Networks tab you should add a new network (a '+' button in the bottom of the window)
    Figure 2 Interconnect virtual network
  2. Install two machines (Host A and B) that will be hosting the keepalived. I already had one CentOS kvm which I cloned to have the Host B (using virt-manager it is very simple and can be done via main GUI)
  3. Install keepalived. CentOS does not have the keepalived in its repos so I had to download the latest sources from the web: http://www.keepalived.org/software/keepalived-1.2.2.tar.gz. After unpacking I did not have to download any additional dependencies and I just followed the INSTALL instructions - as usual steps: configure, make, make install ;) In order to build the keepalived only once (for two hosts) you might also install it before cloning the image (after the image has been cloned it has the keepalived).
  4. Configure sysctl. In the manual I read that one should enhance the sysctl configuration in order to allow the application to bind to non local addresses - add these line to /etc/sysctl.conf:
     net.ipv4.ip_nonlocal_bind = 1
    and execute:
     sysctl -p
  5. Reconfigure firewall. The keepalived utilizes multicast address (224.0.0.18) for exchanging information about the status of the nodes belonging to specific groups. What needs to be done is to allow the multicast traffic to go over the Interconnect network (eth0) interface. In my case for testing purposes I have just disabled the firewall (in Gnome: System->Administration->Security Level and Firewall)
  6. Configure the master host (Host A). One needs to adapt or create the /etc/keepalived/keepalived.conf file as follows:
    [root@localhost ~]# cat /etc/keepalived/keepalived.conf
    vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 102
    virtual_ipaddress {
    192.168.122.50/24 brd 192.168.122.255 dev eth1 label eth1:0
    }
    }

    where eth0 is the interface for interconnect and eth1 is the interface for communication with external world (VIP)
  7. Configure the backup host (Host B). The configuration file /etc/keepalived/keepalived.conf would be mostly the same for both hosts (as for host A) - what one has to remember is to set the Host B into backup mode with lower priority
    [root@centos1-priv ~]# cat /etc/keepalived/keepalived.conf
    vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 101
    virtual_ipaddress {
    192.168.122.50/24 brd 192.168.122.255 dev eth1 label eth1:0
    }
    }

    where eth0 is the interface for interconnect and eth1 is the interface for communication with external world (VIP)

  8. Startup keepalived on both hosts. The best way to start the keepalived is to use the init/startup scripts provided with the source package
    # /etc/init.d/keepalived start
    If you cannot find the script you can get it from the keepalived source package
  9. Proceed to the test (see Part 2)