Monday, November 05, 2007

The Great Email Migration of 2007


My first domain purchase was back in 1999: my last name .ORG. Over the past several years, I purchased .NET and .COM as they became available.

Since 1999, (with the exception of a couple outages George helped me out with), I've hosted my own email. Any time I've changed jobs, I've taken a copy of my email archives with me.

I decided it was time to consolidate all my email under one roof: Google Apps for domains. I moved .COM and .NET over and then began the arduous task of migrating all of my email.

  • Dozens Unix mbox files (some native, some from Thunderbird, some that were migrated once-upon a time from CCMail to Groupwise and then pulled into mbox using IMAP)
  • 1 GMail account (which had BCC's of the Unix mbox files AND forwarded copies of half the email from my current job)
  • 4 Outlook .PST files totaling over 3GB

My Outlook archives were in nice folder hierarchies, so I had to: flatten the folder structure out, use IMAP to drag and drop everything to my mail server at home, and then use the migration tool to suck it in.

The tool labels your messages with the IMAP folder (or folder hierarchy) it came from, and I decided that I wanted the labels to be clean (not: Kaitain\AAAS\Technology Services\Personal Email).



The folders also couldn't have spaces in them, so I used several shell scripts to format:

#!/usr/local/bin/bash
# replace spaces with underscores
for file in *\ *
do
short=`echo $file|sed 's/ /_/g'`
mv "$file" "$short"
done

and to merge all the files into one:

for file in *_*;do cat $file >> Kaitain;rm "$file";done

I don't know how much time I spent flatting my email, but it can't compare to all the time I ever spent organizing all my email.

In the end, can you really quantify the interactions of one person with others?



Apparently you can. GMail extracted 142,119 emails, which threaded into 59,913 conversations since 1999. The total extraction and loading took about three days.



All this takes up over 4000MB of storage. The oldest emails go back to here:



Only about 25 emails were not migrated (either viruses or the attachments were too large), and maybe 5-8 emails that came over "ok" were fucked up and I deleted them. Losing only 32/142,119 is amazing!

1 comment:

schvin said...

very nice!

also i am glad you whipped out a shell script! :)