Raspberry Pi Backup Server

Go to | #1 | #2 | #3 | #4 | #5 | #6 | #7

I have been working on getting more organized in work and life. When things are messy, I can get stressed out and do less things during the day. I think this will be the first post of several about my 2024 organization strategy. Mainly to serve as a knowledge dump for myself, but maybe someone out there will find it useful, painfully boring, or somewhere in between!

Anyway, this is the story of making a backup server for all my personal computer files. It's good to be secure with your online presence, but also good to have security for your offline presence as well. Computer hardware nowadays is pretty robust, but things happen. Maybe you drop your computer. Maybe you spill your coffee all over it. Or god forbid, one day the battery explodes. I have a few old backup drives, but this post is all about my more permanent solution.

Storage Media

I purchased an external 1TB SSD from Amazon, to store regular backups into. This should be more than enough storage for a long time, since my "base directory" was a smidge under 14GB. This folder contains all my photos, code projects, downloaded receipts and such, and all my old assignments from 3rd grade typing homework all the way up until grad school! So many of these files had no reason to be stored on my PC, but I didn't want to permanently delete everything - which is where SSK comes in. (SSK is the brand name of the drive, so let's just call it that.)

I had actually set up a backup server with my Raspberry Pi in the past using a hard drive disk as storage, but this was a long time ago so I had to re-learn everything. Hopefully this story serves as a personal guide for me if I ever need to do this again in the future.

Before touching the Raspberry Pi, I connected SSK to my PC and just did a simple drag-n-drop to copy all my files as their current state, so I at least started somewhere. After it finished a surprisingly quick backup, I then plugged it into the Pi.

Connecting the Hardware

Of course, the Pi did not recognize that SSK was connected, so I had to programmatically mount the disk while logged into the Pi. (df shows which disks are mounted.) Some of these commands need permission elevation (e.g. sudo.) Write these data fields down, since they are all used later:

Bash
lsblk # block device location & partition
blkid # device UUID
fdisk -l # filesystem type
id # user and group id

And follow these steps to manually mount the drive:

Bash
mkdir -v /media/SSK-1TB # create a mount point for your disk
mount /dev/sda1 /media/SSK-1TB # manually mount the disk
  # replace `/dev/sda1` with your device partition
  # replace `SSK-1TB` with your custom device name

If the disk mounted successfully, you should be able to see your backed up files on that new directory, and then unmount the disk:

Bash
ls /media/SSK-1TB # list the files contained in the drive
umount /media/SSK-1TB # manually unmount the disk

Unfortunately, you would need to manually run the mount and umount commands every time the hardware was connected/disconnected, or the Pi was power cycled. To automatically mount the drive on startup, one needs to modify the fstab file.

Bash
1	`sudo nano /etc/fstab`

Add a new line with the following columns separated each by a space:

Text Only
1	`UUID=uuid mountPoint fileSystemType options dump pass`

So for me, it looked like this:

Text Only
1	`UUID=1234-1234 /media/SSK-1TB exfat defaults 0 0`

To apply changes, I did a simple reboot of my machine:

Bash
sudo reboot 0

For me, SSK was factory formatted as the exfat file system format, which brings around other problems in Ubuntu. I did not have permissions to create, modify, or delete files on SSK while it was mounted to my Pi. After some frustrating research, I figured out the issue thanks to a Ubuntu Community post. Luckily, the fix was very easy. I needed to mount the disk specifying a user and group ID in the options column - which can be obtained using the id command. So in the end, my new line in fstab looked like this, where 1000 was my user and group ID:

Text Only
1	`UUID=1234-1234 /media/SSK-1TB exfat defaults,uid=1000,gid=1000 0 0`

After implementing this change and doing another reboot, my disk was mounted successfully with read and write access! Now, for the fun part.

Actually Working

I used rsync for creating and updating my backups. After getting an understanding of how it works, the code is incredibly easy. Here is a minimum example of what I am doing:

Bash
# sync
# For creating an exact mirror of my home directory
rsync -av --delete "$src" "$host:/media/SSK-1TB/mirror/"

Bash
# backup
# For creating a daily snapshot of my home directory
rsync -av "$src" "$host:/media/SSK-1TB/backups/$(date +%Y-%m-%d)/"

WARNING! The first time you run these scripts, it has to initially copy over every file in the source directory. Depending on the total file size, this could take a long time! (It took over an hour to run the sync script.) Also, make sure your storage device has enough storage available for the backup!

For these scripts, you can run manually (using the -n flag for a dry run) or automate them as a CRON job. Of course, we are here to automate as much as possible, so open the crontab with this command:

Bash
1	`crontab -e`

Then, add a new line with your CRON schedule (I like using crontab.guru for this) and the command. Probably not required, but I always like to set my working directory to the location of the script before running it, to minimize any kind of logic bugs. I also like to capture the script output in a log file, to get some insight into what happened. The cd ... && ./script >> output.log pattern is a neat trick I picked up for this kind of thing. I wanted my mirroring script to run twice a day, at 8 AM and 8 PM. I want to invoke the backup snapshot script manually only, but it could be automated with a similar line in the crontab.

Text Only
1	`0 8,20 * * * cd /home/nic/Documents/Nicolas && ./sync >> sync.log`

Note: The backup script will automatically run at 8 AM and 8 PM if my PC is turned on. So, there may be days where it runs once or not at all, but can be invoked manually of course.

Sorta...

It's a few days later now when I'm writing this section. Something strange happened that I can't really explain. At the end of an rsync command, it prints out a summary including the number of bytes transmitted. This number should be large on the first run, but smaller after that (because less files need to be updated on the receiver.) So far, my script always transmitted 14 GB. It seemed to copy over every single file, every time, which completely defeats the purpose of rsync. So I decided to explore this in a testing environment, separate from my home directory, to figure out why this was happening. I tried different command flags, like -u (skip files with a newer modification date on the receiver). I also ran this command to forcibly update the modification dates for my local test files.

Bash
find . -exec touch {} \;

What was bizarre is that I could not reproduce the issue at all. I decided to just go back to my non-testing environment and run my sync script manually. I noticed that it transmitted about 2 GB. Which is odd, since it's been transmitting the entire 14 GB folder every other time. I ran it again, this time it only transmitted about 700 MB. Weird... is it actually doing its job now?? Finally, a third time, it transmitted about 700 kB which is around the overhead I'd expect for a directory of this size with no file modifications.

So, there was definitely something goofy going on. My leading theory is that I may have had some network communication issues between my wired PC (sender) and wireless Pi (receiver) which prohibited some of the larger files from getting backed up. Really, I have no idea for sure.

`tar` Archive

I also changed my mind about using rsync for both the mirror and backups. I learned about tar archiving which works much better for backups - since it packs everything into 1 compressible file. Plus, tar archives preserve folder structure, file ownership, and creation/modification timestamps.

tar has 2 different built-in compression algorithms, gz (lower compression, but faster and less overhead) and xz (higher compression, but slower and higher overhead.) I decided to go with gz since I want to be able to zip and unzip backups quickly for transmission, and don't plan to keep too many backups at once on my server. Here is my updated backup command:

Bash
# backup
# For creating a snapshot of my home directory
tar -cvzf - "$src" | ssh "$host" "cat > /media/SSK-1TB/backup/backup-$(date +%Y-%m-%d).tar.gz"

This creates a tar compressed with the gz algorithm directly on my Pi server named backup-YYYY-MM-DD.tar.gz.

I also found a way to compute a dry run:

Bash
# backup dry-run
tar -cvzf /dev/null "$src"
# find total recursive file size of unpacked source
du -hs "$src"

WARNING! Similar to running the sync command, this takes a very long time to run and transmit data. For my now 19 GB file (containing photos from my phone and camera), it took about an hour!