An introduction to taking proper backups with
emphasis on the data commonly used by artists.
Choose the right backup type and the tools
needed to preserve your valuable data.
A backup, in computer lingo, refers to making a copy of important data
for the purpose of data recovery.
The word "data" refers to anything stored on a computer system: images, programs, documents,
Should the important data get damaged or lost, a properly made backup will restore it all.
Taking backups of important data can prevent loss of valuable work and the time needed to recreate it.
In this article we'll take a look at common backup types and strategies,
data compression, and common backup media types. A real life backup scenario
will illustrate my own backup procedures. The article will end with general
Common backup types
The best backup methods rely on simple and time proven concepts.
The simpler the procedure, the more likely it is to work correctly.
New or unnecessary technologies are best avoided till proven reliable and necessary.
A backup does not need to rely on dedicated software. Making a copy of a file is a basic form of backup.
A full-backup consist of making a copy of all important data.
When you copy a folder with important files, from say a hard drive to a CD,
you actually make a full-backup of those files. Due to simplicity, this approach
is the most reliable of all backup types. Its main advantage is ease
of backup creation and restoration. The main disadvantage is that each
backup will use as much space as the important data. If the data is
large, the backup process can be very resource intensive in terms of
time, backup space requirements, and the processing power needed to carry out. Imagine the time needed
to full-backup a digital library consisting of thousands of movies. Such operation
can take days.
An incremental-backup works differently in that it backs up only the
modified, or newly added files since the last backup. When using this method, a full
backup is created first and then incremental backups are run on regular
basis. For large amounts of data this method is often the only practical
way to backup. It requires less space than taking regular full backups
and is less resource intensive to run. On the other hand, contrary to
full backups, incremental backups need dedicated backup software to keep
track of which files to backup.
Compressing the backup data is a popular option. Such practice lowers the amount
of space needed on the backup media. Although compression adds an additional
layer of complexity, it can be a good (if relied on wisely) and sometimes
Essential backup strategies
Regardless of the backup type and data, the following backup strategies
should always be followed:
- backup should be taken on a regular basis
- backup should be automatic and need as little human supervision as possible
- backup should be stored in a safe remote location
- backup should rely on well established hardware and software technologies
Backup should be taken on a regular basis. The more frequently the data
changes the more often it should be backed-up. For example, some of
my most frequently updated files (website files, source code, notes,
etc.) are backed-up daily. Files that are less frequently updated are
Backup should be automatic. Except for the initial configuration of the
backup program and the occasional supervision, the whole backup process
should be automatic and completely transparent. That is, the backup
should run by itself without causing any attention unless necessary.
Backup should be stored in a safe remote location. Should the location
of the important data get damaged, destroyed, or exposed to theft - a
remotely stored backup becomes invaluable. How remote? Disasters like fire,
flood, tornado, earthquake, etc., can cause widespread damage. Ideally a backup
should be stored in a far away enough, minimal risk location.
Backup should rely on well established hardware and software
technologies. Such technologies are typically in widespread use - thus
cheaper and easier to troubleshoot or get help in the event of failure. As the
established technologies become gradually replaced by new and better
ones, so should the backup media and hardware, and, if used, the software to
re/store the data. There is no guarantee that the common backup media
of today, like CD or DVD, will be usable in ten years. The same is true
for software. Thus, a good data preservation strategy should include
continual migration of the backup data to mature and well established
technologies of the time.
A bit about data compression
Compression makes data smaller and thus is a popular backup option.
Its main advantage is lower backup cost due to lower space use. The downside
is the time needed to compress the data and later to uncompress it
Many compression formats exist. Each format uses some sort of compression method
called an algorithm. There are two types of data compression algorithms: "lossy" and "lossless".
Lossless compression reduce the data size without modifying its content.
Lossy compression modifies the data content to make it even smaller than
Some compression formats, like MP3 or JPG, are highly specialized. They use lossy
algorithms and produce very small file sizes but can only compress a particular
type of data. Other formats, like ZIP or BZIP2, are of general purpose. They rely
on lossless compression algorithms and can work on any data.
However, they will never outdo special purpose formats like MP3 or JPG.
PNG and TIFF are popular image file formats which support lossless compression.
Most of the low cost burnable CDs have a life span of around two years. Higher quality CDs can last up to five.
Unfortunately, due to the nature of lossy compression, JPG, MP3 or any
other lossy format degrade the original data to some extent. In other words, saving an
image or music in a lossy file format will make it different than the original.
Usually the difference, called compression artifacts, is so small that most of us
don't see or hear it.
For the above reasons, lossy compression should never be used when saving
important master data. Only lossless compression is suitable for that. PNG and
TIFF are examples of image file formats that support lossless compression. Such
formats are ideal for storing hi-resolution master images.
Finally, compression takes time and normally uses all available processing
power. Generally, the better the compression the slower it is. Some compression
algorithms are extremely good at compressing but also extremely slow. For backup
purposes, one should evaluate common compression formats and set for the
most suitable one.
Consider your needs
Some additional issues need to be considered when designing the most
suitable backup strategy for own use:
As noted earlier the best backups are simply copies of important
data. Such approach works especially well for artists
who rely on compressed image formats like PNG or TIFF.
- the type of backup files
- if compression is desired, what compression to use and how
- backup storage media
Note the difference between "built-in" image compression, done every time you
save an image in a format that supports it, and compressing the backup
data - applied to all backup data regardless of what it is.
What backup compression to use, and if to use it at all, depends
on the type of backup data. Generally, text files (TXT, HTML,
XML, etc) can be compressed the most of all file types. Images that have
been compressed with their own algorithms (PNG, JPG, TIFF, etc) can't
later be compressed much if at all. Images
which don't have own compression (BMP, TGA, etc) can often be
compressed quite a bit, though this depends on the actual image data.
Thus if most of your important art data consist of images that are
already compressed, there is no need to compress the
backup. Text files on the other hand, can be compressed a lot and save
significant amount of space.
The quality of the backup hardware, media and software are equally important.
There are a few other things to consider when compressing backup data.
What compression program to use and how to compress the files.
ZIP is the most commonly used compression format today - it's fast
and compresses well. Its been around for a long time and is universally
available. But there are other, less known, good alternatives.
For example, 7ZIP, RAR, and BZIP2 compress significantly better than ZIP and
are only slightly slower.
Finally, how to compress backups. Basically one can either create a
compressed archive of many files, or compress each file individually. The
main disadvantage to creating a compressed archive is the possibility
of loosing all files in the archive if the archive gets corrupted
and can not be recovered. On the other hand, if files are compressed
individually one looses only one file - should it get corrupted and be
unrecoverable. Additionally, since a compressed file uses less space
than uncompressed, it's less likely to get corrupted.
Thus it's safer to compress files individually.
A lot of space can be saved thanks to compression.
I took one of my
images and saved it in BMP, TIFF, PNG and JPG formats.
I then compressed those files with a few general purpose compressors.
All lossless compression was done with maximum compression settings. 
Since JPG is a lossy format it is only included for the sake of comparison.
The Book.txt is Sun Tzu's
The Art of War.
|1 440 054
|911 154 (63%)
|693 481 (48%)
|713 287 (49%)
|652 315 (98%)
|655 239 (99%)
|652 955 (98%)
|611 923 (100%)
|613 466 (100%)
|610 711 (100%)
|302 933 (100%)
|302 852 (100%)
|300 268 (99%)
|130 340 (38%)
|100 696 (29%)
|91 187 (26%)
Sizes are in bytes. The percentage indicates how much the compressed size is out of
the initial size. The smaller the better.
The compression times vary somewhat but not too much to be impractical.
PNG is a clear winner among images. It uses about 58% less space
than BMP! Notice that only one of the general purpose compression tools,
7ZIP, further compressed (slightly) the already compressed PNG file.
The book file was compressed down to about 26-38% of its original size,
which is typical for text compression.
What backup media to use
The commonly used backup media today are hard drives, tapes and
CDs/DVDs. Hard drives are the fastest and often the best option for
large amounts of data. They are also the most expensive and not very
durable. Tapes are slow but can store a lot of data and can last decades.
CDs/DVDs are probably the most common backup
media used today due to its very low cost. Unfortunately, just like
hard drives, most have a relatively short expected life span of between
two to five years. Internet backup solutions are also becoming a popular
Reliability is important to consider when choosing the backup media.
How robust is the media and for how long can it retain the data? The
quality of the media plays a significant role here. All media degrade over
time, but some degrade more than other. Most of the low cost burnable CDs
have a life span of around two years. Higher quality CDs can last
up to five. Very high quality CDs with a gold layer are expected to
last decades. Generally, if the handling and storage conditions are good,
quality media should last at least few years without data loss. However,
unless the best quality media is used, an annual full backup is probably
the safest prevention against data loss due to media degradation.
Hard drives are a popular backup media due to large capacity and speed.
A combination of different media may often be the ideal solution. For
example, some of my own backup practices include using an external
hard drive to mirror (update) certain parts of my computer hard drives.
Twice a year I burn all important data on several DVDs.
I recommend spending some time investigating the most suitable media and
the hardware to operate it. High quality products will minimize the possibility
of backup failure.
The necessity of verifying backups
One of the most important aspects of taking backups is making sure
they are error free. The backup data may prove useless if corrupted
due to media or other error.
It's good practice to immediately test the backup for its validity.
Errors will be detected and a new backup can be taken right away.
Any respectable backup program provides an option for data verification.
What good is a backup if its data is corrupted?
I wrote a script
specifically for the purpose of backup verification. If you use Linux you may find it useful.
A real life backup scenario
My most valuable data is my art data, website files, source code, and
various docs. All my hi-resolution work is stored in either PNG or TIFF. Nearly
all my reference images are JPGs. Thus all my image data
can be backed up without the use of compression (during backup) which saves huge amounts of
backup time and space. I do compress 3d files which don't use own
compression. For that I use bzip2 with the maximum compression setting.
All the remaining data are basically text files and are compressed individually
using either bzip2 or 7zip. Images and 3d files, even compressed, can be huge in size.
Not surprisingly over 90% of my backup space is used on art data.
I backup daily, monthly and twice a year. Once a day, the
files which are frequently updated (my notes, work in progress images, source code,
website files, email, etc.) are backed up to another hard drive. This happens during the
boot process and takes about a minute. Once a month I backup to a CD which also includes
less frequently updated files. A copy of that CD is stored in a remote location.
Twice a year I take full backup and store it on several DVDs at a friend's house. If I
work on something especially important, I store it daily on a CD/DVD or a USB mem-stick.
My most critical data is also regularly encrypted and stored on a very remote internet host.
I wrote a script to run all these backups automatically. With the exception of CD/DVD storage,
no manual work is involved.
As you can see, a custom backup solution can be quite sophisticated yet simple to carry out.
It can involve a combination of different media and backup procedures to optimally satisfy
Depending on your needs a dedicated backup software may be a necessary
investment. Make sure to research this carefully. Usually, products from
reputable companies that specialize in backup solutions are best. There
are also many good open source or free software alternatives.
It's best to avoid products which rely on proprietary or closed solutions.
For example, a backup
software may store the backup data in an unknown format only supported
by this particular backup software. Avoid that. If the company goes
out of business and the backup, or backup software, breaks, your backup data may
be lost forever. Look for products that rely on well known, mature, and
ideally open technologies. For example, PNG is an open format for storing
image data. What this means is that the specification, or blueprint, for
that format is publicly available for anyone to use it. This increases
compatibility and reduces reliance on any specific vendor or product.
Most artists important data consists mainly of images and 3d files. To save
space rely on PNG, TIFF or JPG for bitmap image formats. Vector images and 3d files
can be compressed individually if needed. A basic backup software that simply copies
specified files or directories to the backup media may be all that is needed.
It's best to make two sets of the backup data and store each at different location.
One close to home, like a friends place, or a bank box and the other far away.
Setting up a proper backup strategy may initially require a significant
amount of time and cost money. There is a lot to research and consider.
In the end however, a good backup procedure will prove an exceptionally valuable
investment. As you read this, your screen could go blank due to a hard drive crash.
All your valuable data - years of work, reference images, business documents, photo albums,
3d files, email, etc., - could be lost forever. Unless you are prepared and have a backup.