[ACM] **SPAM** Re: ext2/ext3 on-disk format
Peter Froehlich
phf at cs.jhu.edu
Wed Nov 28 11:50:38 EST 2007
Hi all,
Just on the general topic of recovering emails from "trashed" disks,
has it occurred to you that there might be serious investigative
applications for a product that does exactly what Asheesh wants? With
all the emails getting lost in the White House and all... :-)
Seriously, maybe there's even a quick conference paper there, I'd
recommend checking in with Randal and his group.
Cheers,
Peter
On Nov 28, 2007, at 6:44 AM, Antonello Cruz wrote:
> Asheesh,
>
> ext2/ext3 data should be aligned by block (usually it is 2K but can
> be 4K)
> http://en.wikipedia.org/wiki/Ext2
>
> Finding the end of a file that is longer than one block is tricky
> since
> the blocks storing that file is not a linked list. It is a sort of
> tree
> rooted on the inode (see the wiki page). I am not sure how long your
> emails generally are, but if they are shorter than a block, your
> approach
> for finding the beginning of the message should work.
>
> Another approach, more cumbersome though, is finding the beginning
> of each
> message which will tell you the first block of the file you want to
> recover. Then you can go the the blocks that are supposed to heve the
> inodes (you'll need to figure out how ext2/3 is laid out at the
> beginning
> of the disk) and find the inode corresponding to that file. There
> can be
> more than one for two reasons. First, it may be a deleted inode from a
> file previously stored at the same block, or it may be a hard link
> to the
> same file.
>
> Keep in mind that I am not an ext2/3 expert or a storage system
> expert.
>
> Good luck,
>
> Antonello
>
> --- Asheesh Laroia <acm at jhu.asheesh.org> wrote:
>
>> A few months back, I suffered some major data loss on some hard
>> drives.
>> (Lesson learned: RAID is not backup.) I had a partial backup of my
>> emails
>> that were stored on those drives, but a couple of days before the
>> main
>> drives failed I rm -rf'd the backup. The partial backup was
>> stored on
>> ext3.
>>
>> Then the main drives failed, so I saved a disk image of the drive
>> where
>> the partial backup was rm'd.
>>
>> So today I'm looking at that saved disk image in a hex editor. I
>> don't
>> need filenames, and I can identify the sorts of files I want: I want
>> email
>> files (messages, one per file, in Maildirs), and they're really
>> easy to
>> detect: They start with a mail header, which looks something like
>> "Date:
>>
>> Tue, 16 Sep"....
>>
>> But what I do need is a reliable way to detect file boundaries in
>> ext3,
>> preferably a way that works for deleted files also.
>>
>> For file starts - Do they always start at offsets that fit a pattern,
>> like
>> (offset % 2048) == 0? Then I can only start looking for email
>> headers
>> at
>> those positions.
>>
>> For file ends - Is there file-end zero padding until some block
>> width,
>> like "after the file the rest of the 4096-size block is padding with
>> zeroes"? Then I use that to detect that I have the whole message
>> file.
>>
>> The filesystem where the deletes happened can be inspected with
>> things
>> like debugfs or tune2fs. Assume I don't know anything about
>> filesystems
>>
>> but in general am a reasonable fellow who will try to understand what
>> you
>> teach him.
>>
>> I'd dearly appreciate help, for example from people who took Storage
>> Systems. If you only know about ext2, tell me anyway - ext3 is quite
>> similar!
>>
>> -- Asheesh.
>>
>> --
>> I finally went to the eye doctor. I got contacts. I only need
>> them to
>> read, so I got flip-ups.
>> -- Steven Wright
>> _______________________________________________
>> ACM mailing list
>> ACM at acm.jhu.edu
>> http://lists.acm.jhu.edu/mailman/listinfo/acm
>>
>
>
>
>
> ______________________________________________________________________
> ______________
> Be a better pen pal.
> Text or chat with friends inside Yahoo! Mail. See how. http://
> overview.mail.yahoo.com/
> _______________________________________________
> ACM mailing list
> ACM at acm.jhu.edu
> http://lists.acm.jhu.edu/mailman/listinfo/acm
>
--
Peter H. Froehlich <><><><><><> http://www.cs.jhu.edu/~phf/
OpenPGP: ABC2 9BCC 1445 86E9 4D59 F532 A8B2 BFAE 342B E9D9
More information about the ACM
mailing list