Simo Virokannas

Writings and ramblings

Poor man’s photo backup, part I

A photo library, around 100 000 photos, not really professional but nice-to-have photos. Some of them are blurry, unfocused, or just plain horrible photographs. No time to browse through everything and only keep the really good ones. Aging hard drives and you badly need a backup, but 300 GB is just too expensive to store in the cloud. Sound familiar?

What you need now is an insane compression ratio.

Now, how can you drop the file size by a factor of 10 or 20? Can you compress those images down to be tolerable but not super-high quality, since the goal here is just to back them up for the worst-case scenario?

How about storing them in a video file, one per folder, one frame per second so that you can still browse through them when needed? This is an experiment that I’ll be documenting as it goes on. Might be a terrible idea, but who knows?

To do this, the following steps are needed in each folder (all of this should work on OSX or Linux):

Downscale the photos to fit inside a n * n box (letterboxes). This will allow both horizontal and vertical photos to retain their relative quality. ImageMagick does this nicely.
The following line does this, if your input file is “file.jpg”.

convert file.jpg -filter cubic -resize 2048x2048\> -gravity center \
  -background black -extent 2048x2048 out.jpg

Convert all these files into a mjpeg stream, so that it can be fed to a video encoder.

convert file.jpg -filter cubic -resize 2048x2048\> -gravity center \
  -background black -extent 2048x2048 jpeg:- >>all.mjpeg

Do this in a loop for all the photos in a single folder:

for a in `ls | grep -i "\.jpg"`; do
convert "$a" -filter cubic -resize 2048x2048\> -gravity center \
  -background black -extent 2048x2048 jpeg:- >>all.mjpeg
done

Now there’s a huge file, all.mjpeg, that has all those photos letterboxed with a black background, as a continuous sequence. This, of course, can easily be fed to ffmpeg:

ffmpeg -vcodec mjpeg -r 1 -i all.mjpeg -s 2048x2048 \
  -vcodec prores_ks -y out.mov

My input images were around 300 MB, the intermediate mjpeg file was 80 MB, and this gives me a prores video file at around 60 MB. The output video file looks really sharp and there’s barely any loss of quality. We could have some loss of quality.

ffmpeg -vcodec mjpeg -r 1 -i all.mjpeg -s 2048x2048 \
  -vcodec prores_ks -y -qscale 16 out.mov

This drops the forced quality in prores by a factor of 16. Some blockiness will appear in subtle gradients and blurry regions of a photo, but that is completely fine. Result: 19 MB. That is, a compression ratio of 15:1.

This would bring 300 GB of data down to 20 GB. That’s something that already can be stored on a USB thumb drive. Or on a slightly larger free could account. Acceptable.

Now, what about all the metadata?

As we’re dealing with photos, it’s got to be exiftool. It can output an XML with all the metadata included. Now, to do all of this in a simple loop and end up with a .MOV and .XML file, I wrote the following shell script, called “photorescue”:

#!/bin/bash
pwd=`pwd`
thisdir=`basename $pwd`
for a in `ls | grep -i "\.jpg"`; do
  echo $a...
  convert "$a" -filter cubic -resize $1\> -gravity center \
    -background black -extent $1 jpeg:- >>all.mjpeg
  exiftool -X $a >>"$thisdir.xml"
done
ffmpeg -vcodec mjpeg -r 1 -i all.mjpeg -s $1 -vcodec prores_ks \
  -qscale 16 -y "$thisdir.mov"
rm all.mjpeg

 

This tool will handle all the .jpg (and .JPG for that matter) files in the current folder and output two files:

photorescue 2048x2048

Where 2048 is the resolution you want to use. A folder can in this way be classified as “I really want to keep these in higher res” or “I don’t care about these photos but whatever”, using higher and lower values as needed. Using a really low value will make a thumbnail video.

Other things to try:

– Use a non-keyframe-only video codec to deal with folders with lots of similar photos and automatic scene change detection to bring the compression ratio even higher.
– Write out the actual rectangles and timestamps where the image ended up into the XML metadata


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.