rsnapshot: cheap and powerful

Losing data hurts. Backups are important. Having a good RAID system is a good fist step to keeping your bits safe, however all it takes is one misplaced “/” in an rm command to send all your data to the great bit bucket in the sky. Enter rsnapshot.

rsnapshot is great at making incremental backups. It can do this locally or remotely over ssh. Also each snapshot looks like a full backup. This powerful utility uses rsync and hard-links to keep as many snapshots as you like of your data without using more space than the changed files.

One of the many nice features about rsnapshot is that it uses ssh or rsync. This makes it very portable and configurable. Your data-source machine and backup-machine can be running vastly differently versions of linux/unix/BSD, even Windows. In my case I found a spare motherboard, case and hard drives, set up a JBOD through LVM and had a nice sized backup machine that would be plenty powerful for rsnapshot.

The basic steps for configuring rsnapshot
1) Setup rsync on data source
2) Configure rsnapshot on backup machine to connect to data source
3) Schedule rsnapshot
4) Monitor the backup process

Step 1) Setup rsync on data source

rsnapshot uses rsync either through ssh or rsyncd to pull its data. I decided to use rsyncd so it was easier to set up backup-only users.

Configuring rsyncd is pretty trivial. after you apt-get install it, just modify /etc/rsyncd.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[backup]

comment = Backup for Sarlaac
path = /drives/array0/
lock file = /var/lock/rsyncd
read only = yes
list = no
auth users = user_for_backup
secrets file = /path/to/secrets_file
strict modes = yes
ignore errors = no
ignore nonreadable = yes
transfer logging = no
# log format = %t: host %h (%a) %o %f (%l bytes). Total %b bytes.
timeout = 600
refuse options = checksum dry-run
dont compress = *.gz *.tgz *.zip *.z *.rpm *.deb *.iso *.bz2 *.tbz
uid =
gid =

Step 2) Configure rsnapshot on backup machine to connect data source

The configuration for rsnapshot itself is generally /etc/rsnapshot.conf

There are a lot of useful options here, however I will just show the ones that I changed from the defaults:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
snapshot_root /drives/foundation/ # Sets the location to where snapshots will be stored
cmd_du /usr/bin/du                # Use GNU du to use commands like rsnapshot du

# This section determines how many of each labelled backup to save.
# rsnapshot does not know what 'daily' or 'weekly' means.
# It is up to you to decide how to schedule these.
interval daily 7 # Keep 7 'daily' snapshots
interval weekly 4 # Keep 4 'weekly' snapshots
interval monthly 6 # Keep 6 'monthly' snapshots

logfile /var/log/rsnapshot.log  # Log to a file

# The data source and what folder to put it in within your snapshot_root
backup rsync://backupofsarlaac@sarlaac/backup_sarlaac/ Sarlaac/

If you have configured everything correctly, you can now run ‘rsnapshot daily’ to create your first snapshot.
If things didn’t go so well: Check your rsyncd configuration by running rsync independently. Also the rsnapshot file is very sensitive to tabs. Don’t use them.

Step 3) Schedule rsnapshot

rsnapshot is oblivious to what you might mean by ‘daily’, ‘weekly’ or ‘monthly’ backups. It just treats them as a series of snapshot points based on when you kicked them off. Also, the default documentation instructs you to make sure you put enough time between calling , for example, ‘rsnapshot daily’ and ‘rsnapshot weekly’, so the daily command isn’t still running when you kick-off weekly. To this end I created a small script for automating which snapshot should get kicked off. This is useful if you are only doing a single snapshot per day:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash

if [ `date +%d` = 01 ]; then
 echo -e ----Monthly-------
date
echo -e ----Monthly------- '\n'
 rsnapshot monthly
fi

if [ `date +%u` = 1 ]; then
 echo -e ----Weekly-------
date
echo -e ----Weekly------- '\n'
 rsnapshot weekly
fi

echo -e ----Daily-------
date
echo -e ----Daily------- '\n'
rsnapshot daily

This ensures that the monthly and weekly snapshots only get rotated at the correct intervals. This does not protect against running the script multiple times in a single day. You can then put this script into the cron and you have a daily backup running.

1
05 4 * * * ~/bin/rsnap.sh 2>&1 >> ~/backup.log

If you have a machine dedicated to your backups, you might not want to have to keep it on all of the time. In my case I have an entry on the BIOS to start the machine up at a certain time which correlates to the entry in the cron for the script. I added a shutdown statement to the end of the script to power the machine off when the backup is finished.

1
2
3
#!/bin/bash
....
/sbin/poweroff

My data eventually outgrew my modest JBOD setup, so I added a 4TB (2x2TB) RAID0 array to my media center computer. Now I had 2 problems.
1) I don’t want my backup data accessible accidentally to the users of my media center machine.
2) I don’t want my media center machine to shut down at the end of a backup if I am in the middle of using it.

I added a little bit more magic to my backup script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash
#This is used to properly handle the order and timing of rsnapshot on a system where there is only 1 daily cron entry
exec 2>&1

START_UPTIME_FLOAT=`cat /proc/uptime | awk '{ print $1 }'`
START_UPTIME=${START_UPTIME_FLOAT/\.*}
echo "UPTIME AT START $START_UPTIME seconds"

if [[ -z `mount|grep foundation` ]]; then
  echo "INFO: Backup Drive Not currently mounted"

  if [[ ! -a /dev/md/0 ]]; then
    echo "INFO: md0 not currently assembled; Assembling"
    /sbin/mdadm /dev/md/0 -A /dev/sda2 /dev/sdb1
  fi
  echo "INFO: Mounting Backup Drive"
  mount /dev/md/0 /drives/foundation
fi

if [[ -z `mount|grep foundation` ]]; then
 echo "ERROR: Backup Drive still not mounted, exiting!"
 exit
fi

if [ `date +%d` = 01 ]; then
 echo -e ----Monthly-------
date
echo -e ----Monthly------- '\n'
 rsnapshot monthly
fi

if [ `date +%u` = 1 ]; then
 echo -e ----Weekly-------
date
echo -e ----Weekly------- '\n'
 rsnapshot weekly
fi

echo -e ----Daily-------
date
echo -e ----Daily------- '\n'
rsnapshot daily

umount -l /drives/foundation

if [[ $START_UPTIME -le 2400 ]]; then
        date
        echo "Uptime was less than 2400 seconds at start. Shutting down."
        /sbin/poweroff
fi

This now checks the uptime of the machine before it shuts down to try to guess whether it was brought up by the BIOS for a backup, or if it was powered up manually to watch a video. Also this mounts and unmounts the backup drive which can prevent accidental access when using the machine as a media center. On the flip side, you could use the local backup to serve your media to reduce network latency. Since I have gigabit ethernet running between my main data server and my media center/backup machine, I don’t bother using the backup for anything but backup.

Step 4) Monitoring the backup

Set it and forget it! Right? Well that is what you want from a solution, however all things fail and it is important to know when that happens! There are many process monitoring solutions out there, but for me this is just about the only process I was interested in the outcome of, so I wrote a small script to inform me if the backup has failed or hasn’t run in 24 hours.

First I added the following to the backup script before it shuts down the machine:

1
scp /var/log/rsnapshot.log source_machine:/var/log/rsnapshot/rsnapshot.log

Then on the source machine, the following cron entry:

1
0 12 * * * ~/bin/check_rsync_logs /var/log/rsnapshot/rsnapshot.log

The script should only have output if there are errors and they will get mailed to me if they do. Here is the script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/perl
if (@ARGV < 1) { die "Too few files specified"}
chomp($date = `date +%d/%b/%Y`);

foreach my $filename (@ARGV) {

    #Check the file exists
    if ( -e $filename ) {

        #Check it was modified today
        if ( -M $filename < 1 ) {
            #Check if it has entries for today
            open LOGFILE, $filename;
            my @matches;
            foreach my $line (<LOGFILE>) {
                #chomp $line;
                if ($line =~ /$date/ ){push(@matches,$line)}
            }
            #Check if there were any matches
            if(@matches.size > 0)
            {
                foreach my $line (@matches){
                    if ($line =~/error/i){
                        print "$filename had Error: $line";
                    }
                }
            }
            else{print "No entries in the file ($filename) for today ($date)!\n";}
        }
        else{ print "$filename was last edited ", (-M $filename), "days ago, not today!\n"}
    }
    else{ print "$filename doesn't exist\n"}
}

Now when the power is out when the backup should have run, I’ll get an email informing me as such.

Closing Thoughts

Backups are important. Both for your sanity and your data. Keeping your backups on the same machine as the original copies is not as safe as having them on a seperate machine. Even better would be off-site backups. However people are turning to cloud based services to host their data and their backups. You should make a local backup of that data whenever possible. If you are using free services, there is (usually) no guarantee your data will always be available to you. A site could be seized by the government, or collections agency before you get a chance to copy your data. If you have large amounts of data, including High Definition video, then your only cost effective backup solution is going to be local. Use free tools like rsnapshot, Linux RAID and ssh to configure the right solution for you, at the right cost.

Leave a Reply