Rsync – The free and easy LTFS software to archive files to LTO-5 – Complete with checksums and source deletes

Introduction

StorageDNA recently launched DNA Evolution at NAB 2011. It is an exciting moment for us as we now fully support LTO-5 tape. We spent the last year understanding how LTO-5 fits into modern file based workflows and are proud to announce a solution that enables LTO-5 based workflows in the field/production, in post production and long term preservation.

A great new technology that DNA Evolution is built on is LTFS or Linear Tape File Sytem. More details on LTFS can be found here.

While DNA Evolution builds advanced LTFS powered media management workflows, LTFS in its simplest use case makes it extremely easy to copy files to and from LTO-5 tape. This is great news, because up until now, you needed a backup or archiving software to do so. There are a number of users (like me) who need to simply copy files to and from tape for protecting their digital media content. I for example am an amateur photographer  and LTFS gives me a simple way to protect my little baby girl’s photographs.

So this tutorial will show you the correct way to copy files to and from LTFS tape.

Why can’t I just copy files to LTFS using Finder or Explorer?

Often times users complain about LTFS when they try and use Finder/Explorer to copy files. There are a number of reasons Finder and Explorer are not well suited for copying files to and from LTFS:

1. Today’s file management tools take the liberty of doing a lot more than just what we want. They auto create previews, index files etc. These are done with the assumption that the underlying medium is disk and not tape. When the underlying medium is tape, creating index files, previews etc. can severely cripple tape with constant seeks.

2. You really cannot trust Finder and Explorer to make a long term copy of your data. Neither Finder nor Explorer offer features such as checksumming to ensure your data is truly on tape before you delete media from your flash and USB storage.

3. Finder and Explorer are not tuned for higher performance writes or reads taking a much longer time to copy your data.

There are a number of other reasons. However, the frustration of tape seeks and my lack of trust in Finder/Explorer to copy my data to LTFS made me look at other solutions.

Rsync!

Rsync, like LTFS is an open source tool that has been around since ages. So it is well tested and thousands of users trust it on a daily basis. Rsync is basically an advanced file copy tool. It does a great job in copying files over the network, or locally. Rsync however is a disk to disk copy tool and up until LTFS made tape look like disk – rsync was not able to copy files to and from tape. But with LTFS, we can now use this great tool to also copy files to and from tape. Rsync unlike Finder and Explorer does only what it is asked to do – so it doesn’t cause the tape to seek wildly! Additionally, it has features such as checksum, source delete, progress etc. – all of the things you need to ensure your content is correctly copied on tape.

What you will need

Hardware: Firstly, you will need the correct hardware setup. The scope of this article does not cover the exact hardware requirements but it will be a combination of a Mac OSX Leopard/Snow Leopard server, a SAS/FC card to connect to your tape hardware and a tape drive. From what I know, HP, IBM and Quantum support LTFS on their latest LTO-5 drives.

Software: Even though LTFS is open sourced, every vendor creates and supports their own builds for their particular tape drives. Links to each vendors LTFS site can be found HP (click here), IBM (click here) and Quantum (click here). Once you have connected a tape drive, you will need to download the LTFS binaries and follow the instructions to format and mount the tape (similar to when you format a hard drive). Once the tape is mounted, you are ready to start copying files. The second piece of software you will need is rsync. The good news is that rsync comes built-in on every Mac. So simply open Terminal (Applications->Terminal). On terminal, type the follwing:

-        rsync –help

Note: There are two dashes before help.

You will see a rather detailed usage guide. But don’t worry about all the options, as this guide will tell you exactly what you need to know.

Note: This guide assumes that you are comfortable with “Terminal” and command line usage. While there are GUI utilities that can use rsync, they are not nearly as quick to use as command line.

Step-by-Step Guide

In this step-by-step guide, we will assume that you want to archive some content to LTFS/LTO-5 tape. Upon archival, you want to ensure that the content that you archived has “truly” been archived to tape. Once you have ensured this, you will then want to delete the source media files to free up space. Finally, you want to keep a record of everything you archived and also be able to search files.

For the purposes of this example, I am going to be archiving a number of images. However, the steps are exactly the same no matter what the file or content type.

Step 1: Mount the LTFS Tape

After you have formatted your tape (you will need to refer to your vendors LTFS manual to get format instructions) via LTFS you will mount it. You will need to create a directory to mount the tape. Please ensure that you create a directory that matches the tape serial. In this example, the tape has been formatted with serial NIKTAP1 (Nikon Tape 1) and it has been mounted at /Volumes/NIKTAP1. This is done so that we can get a search index. Here are the steps:

-        mkdir /Volumes/NIKTAP1 (create directory with tape serial)

-        ltfs /Volumes/NIKTAP1 (mount the available tape in the first tape drive to the volume path)

Now you should have a mounted LTFS tape with the correct tape serial in the path. Now we are ready to start archiving files.

Step 2: Ensure we have enough space

The first thing we need to do is to make sure we have enough space on tape to actually copy the files. Run the following steps:

-        du –h

  • This will tell you all the drives that are mounted and how much space is left on each device. Identify the line that tells you how much is left on the tape you just mounted. Generally it will say the total tape size is 1.4 TB. It will also tell you available and free space. I recommend leaving 20 GB free on each tape and not use the tape to its end. Yes its wasteful but tape is cheap and this ensures that the LTFS filesystem doesn’t encounter any difficult out of space conditions.

-        df –hcs /Volumes/USBMedia

  • Then determine how much space your source files are going to need. Lets assume the media you want to copy is on /Volumes/USBMedia/. This will tell you exactly how much space will be needed.

-        If the space left on tape is greater than what your source will take, you can run the archive.

Step 3: Create a root directory on tape

This step is organizational. But I keep a subdirectory for every archive I perform. In this case we will create a sub-directory on tape using the following:

-       mkdir /Volumes/NIKTAP1/2011-05-10-Reel0

Step 4: Rsync (Dry Run)

Rsync allows you to run in a dry run mode. This mode doesn’t copy any files but allows you to see what is about to happen. This is a great way to determine if everything will go according to plan when you actually run the archive. To run in dry mode, follow these steps:

-        rsync -avrh –progress –stats –n /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/

  • The above command tells rsync to copy all files and directories under USBMedia to sub-directory on tape called 2011-05-10-Reel0.
  • The various options stands for the following
  • ‘a’ means archive, so it preserves permissions etc.
  • ‘v’ means verbose, so it gives a detailed output of what is happenin
  • ‘r’ means recursive, so it tells rsync to recursively copy sub-directories
  • ‘h’ means human readable, to it tells rsync to print everything in human readable format
  • –progress means show file by file transfer progress
  • –stats means show detailed stats at the end of the transfer
  • -n means perform the above in a dry run mode. So don’t transfer anything – show me what is about to happen.

-        Once the above is executed, you can quickly inspect if what you intend to copy is truly what is going to get archived.

Step 5: Running the actual transfer

Now we can go ahead and run the actual transfer – to perform the archive, you will run the same command as the one above with one difference. You will drop the ‘n’ option from the command:

-        rsync -avrh –progress –stats /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/

  • This will take some time to run. You should hear the tape spinning up, seeking etc. and then the files should start transferring. Rsync reports a per file progress as files are being archived with an end of run status being reported as well.

Step 6: Checksum content

The next step is to checksum every file on source against every file on the tape. This guarantees that every file is truly on tape. This is beneficial even if you are not going to delete content but absolutely essential if you are going to delete the source files in the next step. We run the checksumming in “dry run” mode because we do not expect any data to transfer in this step.  To run checksums, perform the following:

-        rsync -avrh –progress –stats –n –checksum /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/

  • This command is pretty much the same as the dry run command with the checksum option added.

Step 7: Remove source files

In this step we are going to remove all source files that have been archived. We run this step after ensuring that all files are checksummed. So we can go ahead and clear out the source files. To perform the source deletion, perform the following:

-        rsync -avrh –progress –stats –checksum –remove-source-files /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/

  • In this step we run an extra checksum along with the deletion. Please note, rsync will not remove the source directories – just the files. You will need to manually remove the source directories.

Step 8: Creating a searcheable catalog via Spotlight

In this final step we create a text file that contains a list of all files that we archived. Once we create this file, Spotlight auto indexes the text and then enables searches across any file name and tape name. Remember, since we created the mount point with the tape serial, this is also indexed. Follow these steps:

-       I like to maintain a single catalog file for every tape serial. I create a simple text file for every tape.

-       find /Volumes/NIKTAP1 > NIKTAP1-Catalog.txt (all file entries under the tape are added to this catalog file. Using this comment, after every archive this file will be overwritten with the new data.)

-       The great part about creating this index catalog is that Spotlight will automatically index the contents of this file. So I can simply perform a spotlight search of the file name or tape serial that I am searching for and Spotlight will auto-find the catalog file. Spotlight also allows more complex search criteria.

Step 9: Highly recommended last step

While this step is not mandatory I would highly recommend it. LTFS is a file-system on tape. We are archiving to tape to guarantee long term protection of our data. When content is written via LTFS, it performs its own caching. One clear way to ensure your media is fully flushed, is to unmount and remount the tape. I do this as a final check to ensure my content is on tape – especially if I am going to delete the source media.

Summary of commands

-        mkdir /Volumes/NIKTAP1 (create directory with tape serial)

-        ltfs /Volumes/NIKTAP1 (mount the available tape in the first tape drive to the volume path)

-        du –h (determine space on tape)

-        df –hcs /Volumes/USBMedia (determine size of source)

-        mkdir /Volumes/NIKTAP1/2011-05-10-Reel0 (make a sub directory on tape)

-        rsync –avrh –progress –stats –n /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/ (dry run)

-        rsync –avrh –progress –stats /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/ (actual archive)

-        rsync –avrh –progress –stats –n –checksum /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/ (checksum verification)

-        rsync –avrh –progress –stats –checksum –remove-source-files /Volumes/USBMedia/ /Volumes/NIKTAP1/2011-05-10-Reel0/ (source deletion)

-        find /Volumes/NIKTAP1 > NIKTAP1-Catalog.txt (Creates searcheable catalog)

-        umount and remount tape for verification

FAQs

  • What about Windows? Currently LTFS is open sourced on Mac or Linux. These steps will work equally well for either platform. LTFS is not supported open source for Windows yet. The only vendor who has an LTFS driver for Windows is IBM. However, the above steps can be performed for Windows as well. If readers would like an equivalent tutorial I can do that as well.
  • What about a GUI version? I think cmd-line via Terminal is a lot more efficient. If the end user community really wants a GUI I can plead our tech team to whip one up. Let me know if something like that will be useful.
  • What kind of content can I archive? The great thing about this workflow is that they work for any file type. You can archive source footage (Red, P2, DPX, Arri) or docs and excel sheets as well.
  • Do I need any additional backup software? Technically no. For simple archiving to tape, the above software is all you need. Software packages like DNA Evolution are designed for more complex and automated workflows. Click here to learn more about DNA Evolution solutions.