środa, 12 października 2011

Linux: cronjob redundancy and failover with rcron & keepalived)

The solution described in this article is about introducing additional HA features like redundancy and failover to the periodic jobs executed via cron. It means that a periodic job is spread over multiple nodes but only one is executing it. In case of failure of this node another one can take over this task and execute it.
Actually what is interesting this kind of functionality might be provided by the by the combination of the keepalived and rcron (deployment figure presented below). How will it work? The rcron is an enhanced version of cron that first checks the state of the node in a file and only if the node is the master node the job is executed. The state of the node is on the other hand controlled by the keepalived daemon namely only the master node has the active state for the rcron job. In case of master node failure the active state is transferred to the backup node with the highest priority.

Figure 1 Failover and reduncancy using rcronjob and keepalived

That was theory let's check it in practice.


Getting the rcron binary

The rcron is not available in the standard CentOS repo and has to be built from the source. The best way is to install subversion and checkout the latest source tree from the googlecode. Rcron in order to compile requires two additional packages: byacc and flex. You can download the packages from the standard repository. Below one can find the commands to be used to install dependencies and build rcron from source code:

  #  yum install subversion byacc flex
  #  svn co http://rcron.googlecode.com/svn/trunk rcron
  #  cd rcron/
  #  ./configure 
  #  make

  #  ./install-sh 

  #  make install

Please repeat the operation on all nodes on which you would like to have the redundant cronjob configured.


Configure keepalived for redundancy and failover


Now as we have the rcron binary available let's get to the keepalived. For the initial configuration of the keepalived please check the previous articles on my blog. For the redundant cron job an additional vrrp instance is to be configured with a special handling of the state for rcron (/var/run/rcron/state file).

Host A's keepalived configuration file (/etc/keepalived/keepalived.conf) should be:

vrrp_instance CRON_1 {
    state MASTER
    interface eth0
    virtual_router_id 31
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    notify_backup "/bin/echo passive > /var/run/rcron/state"
    notify_master "/bin/echo active  > /var/run/rcron/state"
    notify_fault  "/bin/echo passive > /var/run/rcron/state"
}

Host B's configuration file (/etc/keepalived/keepalived.conf) should be:


vrrp_instance CRON_1 {
    state BACKUP
    interface eth0
    virtual_router_id 31
    priority 99
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    notify_backup "/bin/echo passive > /var/run/rcron/state"
    notify_master "/bin/echo active  > /var/run/rcron/state"
    notify_fault  "/bin/echo passive > /var/run/rcron/state"
}

After startup of keepalived daemon the state on:
  • HostA is active:
    [root@centOS-hostA ~]# cat /var/run/rcron/state 
  • active
  • HostB is passive:
  • [root@centOS-hostB ~]# cat /var/run/rcron/state
    passive
Configuring rcron daemon


The configuration of the rcron is based on the examples.
The main configuration file for HostA:

[root@centOS-hostA log]# cat /etc/rcron/rcron.conf 
# An arbitrary name
cluster_name        = rcron_redundantjobs
# A file containing either the word "active" or the word "passive"
state_file          = /var/run/rcron/state
# The default state in case state_file can't be read
default_state       = active
syslog_facility     = LOG_CRON
syslog_level        = LOG_INFO
# We can tune jobs niceness/priorities (see nice(1)).
nice_level          = 19

The main configuration file for HostB:


[root@centOS-hostB]$ cat /etc/rcron/rcron.conf 
# An arbitrary name
cluster_name        = rcron_redundantjobs
# A file containing either the word "active" or the word "passive"
state_file          = /var/run/rcron/state
# The default state in case state_file can't be read
default_state       = passive
syslog_facility     = LOG_CRON
syslog_level        = LOG_INFO
# We can tune jobs niceness/priorities (see nice(1)).
nice_level          = 19


The cronjob configuration for both nodes - the rcron job shall be executed in rcron environment as follows (the job is only example, syntax as in cron, this job is configured to be executed every minute - in the output file we will check if the job has been executed or not):

[root@centOS-hostA log]# crontab -l
* * * * * /usr/local/bin/rcron --conf /etc/rcron/rcron.conf echo `date` >> /tmp/output
[root@centOS-hostA log]# 

And that is it.


Limitations
  1. Keepalived does not have any hook/notification that would allow to update the status file when the daemon is exiting. It means that there is an additional piece of software needed that would monitor the status of keepalived and update the rcron file if it shuts down. The perfect candidate for that purpose would be the upstart init daemon but unfortunately it is not part of CentOS 5.7 (as soon as I upgrade to CentOS 6 I will try it out)
  2. RCron is usually not part of any standard repository and have to be compiled from the source. Also there is no support for it which makes it difficult for commercial purposes.
Conlusions

RCron together with keepalived could provide the redundancy and failover for the periodic jobs but there are certain limitation described above that have to be taken into account.

Brak komentarzy:

Prześlij komentarz