Thursday, 26 August 2010

RAID5 Parity Detection

Today I am going to show you a new algorithm we just implemented to detect the parity location on each member drive of a RAID5.

It doesn't reply on the file system of the RAID5. It can be used for Windows RAIDs, Linux RAIDs, MAC RAIDs and UNIX RAIDs. Even though the algorithm is not giving you a complete RAID configuration, the result generated by the application will indicate the strip size and the drives' sequence. The drives' sequence is different to drives' order. For example, a RAID5 has a drives' sequence of 3-2-0-1, the drives' order can be 2-0-1-3, 0-1-3-2, 1-0-2-3, etc. It depends on which drive is the first drive and the rotation direction of the RAID5.

The example given in the video is a 4-disks Windows RAID5. According to the result, we worked out the strip size is 256 sectors and the drives' sequence is 4-3-2-1.



This algorithm becomes extremely useful to work out the RAID configuration (strip size, drives' order, header size and rotation direction)when dealing with a RAID5 created by an unknown or a strange file system. We will discuss this further on our blog in the near future.


(Designed by Zijian Xie, R&D Manager, MSc, BEng)

Tuesday, 10 August 2010

Data Structure of Non-Standard Sector

Today, details in identification of a non-standard sectored device will be introduced.

In conventional storage systems, they are normally using a standard 512Byte sector unit to store user data. In order to achieve a lower BER (Bit Error Rate), addition CRC checksum is added after each 512Byte user data in high level storage systems. These addition bytes will ONLY be recognized by its RAID controller where the storage media is operating with. The RAID controller will eliminate the addition bytes before it passes the raw data to the operating system (Windows OS/Linux/etc.) from a non-standard sectored storage media. In another word, device users will not be able to tell if a non-standard sector scheme has been used or not in their systems. So as for data recovery companies, file systems will not be recognized by any recovery application if the hard drives from a non-standard sectored RAID controller are raw mounted under aliened computer.

As the figure shown below, 8 byte CRC checksum are attached to the end of each 512 byte user data, which results in data shifting after the sector 0. The DBR (DOS Boot Record) of NTFS file system (as content show beginning with “EB 52 90”) will be shifted to address offset 8 (as shown on the right in this example) instead of offset 0 where it is supposed to be (as shown on the left). As you can imagine, all the remaining content will be shifted accordingly. The shifting cycle is 64 times (=512/8). It means that you will see a standard sector of data without shifting every 64 blocks of 520 sectors.


Written by: Zijian Xie (R&D Manager, BEng, MSc)

Saturday, 7 August 2010

Winhex Template for Compound Document File Header

I created a Winhex template for the Compound Document File header. Used the repaired Excel spreadsheet in the last article as an example, the template will look like:


Written by: Zijian Xie (R&D Manager, BEng, MSc)

Friday, 6 August 2010

Microsoft Excel Document Repair

Having read the paper of ‘MS Compound Document File Format’ again, I created a sector map to demonstrate the parameters in the file header.



The green and blue parameters are the standard values. They are generic values except the revision number and the version number (which are normally not important at all). The red and yellow parameters are the critical ones which are used to construct the actual content in a spreadsheet.

• SAT Size (sec) : The number of sectors that used by the Sector Allocation Table (SAT).

• First SecID of DIR : The SecID of the first sector that stores the Directory table (DIR).

• First SecID of SSAT : The SecID of the first sector that stores the Short Sector Allocation Table (SSAT).

• Size of SSAT(sec) : The number of sectors that used by the SSAT.

• First SecID of MSAT : The SecID of the first sector that stores the Master Sector Allocation Table (MSAT).

• Size of MSAT : The number of sectors that used by the MSAT.

• SecIDs of MSAT : The SecID chain that stores all the SecIDs used by the MSAT.

Only the first 109 SecIDs of MSAT will be stored in the sector of file header. It will be padded by ‘0xFFFF’ if it has less than 109 SecIDs.

(Note: If you need more explanations, please refer to http://sc.openoffice.org/compdocfileformat.pdf) )

Based on the concepts above, I am trying to repair a corrupted spreadsheet document manually, instead of using repair software. Open the file, it shows the content are corrupted:

Use Winhex to view the file (beginning), it shows:


Not saying the rest of the file, it is quite obvious that the file header (Page 1) is damage completely. As we know, the standard sector size of OLE2 document is 512 bytes. We can interpret the file as a hard disk, which makes it easier to examine the data structure. In Winhex, this file starts from sector 0 to sector 2327. This sector number is different to ‘SecID’ described in OLE2 document specification. They have a relationship as the table below:

OLE2

Header

SecID 0

SecID 1

Sec 2

SecID 3

SecID N

Physical

Sector 0

Sector 1

Sector 2

Sector 3

Sector 4

SecID N+1

To repair the sector 0 (Header), we can copy the content of sector 0 from a working .xls file to replace the current damage one. This action will repair the standard values. Obviously, the critical parameters mentioned above need to be adjusted or recalculated. Thus, the places where need changing are:

1. SAT Size (sec) : offset 0x2C to 0x2F

2. First SecID of DIR : offset 0x30 to 0x33

3. First SecID of SSAT : offset 0x3C to 0x3F

4. Size of SSAT(sec) : offset 0x40 to 0x43

5. First SecID of MSAT : offset 0x44 to 0x47

6. Size of MSAT : offset 0x48 to 0x4B

7. SecIDs of MSAT : offset 0x4C to 0x1FF

If the file size is not bigger than 6.8MB, the content of place 5 and place 6 do not need changed. This is because 109 double words pointers will be enough to store the whole SAT. Place 5 will be ‘0xFEFFFFFF’ (-2) and place 6 will be ‘0x00000000’.

Let’s start to repair the critical parameters by finding the SAT first. SAT is very similar to FAT tables in FAT32 file system. It contains a multiple chains of sector pointers. Also, this is usually stored at the beginning area. Looking at sector 1, this is a sector used by the SAT obviously. The first double word of ‘0xFDFFFFFF’ indicates the current sector is used to store the SAT. There are another two ‘0xFDFFFFFF’ at this sector, which indicate SecID5 and SecID6 are also used by SAT.

At SecID6 (sector 7), we found another two pointers have a value of ‘0xFDFFFFFF’. They are SecID 308 and SecID 309 respectively.

After a continuous searching, the SecIDs used by SAT are

SecID

Physical Sector

0

1

5

6

6

7

308

309

309

310

613

614

614

615

615

616

917

918

918

919

1221

1222

1222

1223

1524

1525

1525

1526

1526

1527

1830

1831

1831

1832

2164

2165

2187

2188


There are 19 sectors used by the SAT. Thus, Size of SAT and First SecID of SAT can be modified to:

Size of SAT = 19 (0x13000000)

First SecID of SAT = 0x00000000

Also, the field of SecIDs of SAT can be changed according to the table above. There are less than 109 SecIDs being used by SAT, thus:

First SecID of MSAT = 0xFEFFFFFF
Size of MSAT = 0x00000000

Directory table is always started with ‘ROOT ENTRY’. We search for a string of ‘ROOT ENTRY’ in UNICODE, and found sector 2 is used to store the DIR. Thus the First SecID of DIR is 0x01000000.

Regarding the SSAT, it will have a similar data structure as the SAT. But normally, it will have a shorter size in the length of the table. Checking through the first few sectors of the file, I found sector 3 should be the first sector of SSAT.


Thus, the First SecID of SSAT is 0x02000000. To check the next SecID from the SAT, we found a value of 0xFEFFFFFF at the offset of SecID 2, which indicates that the current SecID is the end of the current SID chain. In another word, the Size of SSAT is just a single sector, which has a value of 0x01000000.

To replace the recalculated values in the file header, it will look like this:

Try to open the file and it is readable now.

After manually recovered the file, I was interested to see if the recovery software can do anything or not. I tried a demo version of Excel Recovery software. Finally, the preview shows nothing apart from the sheet names.


Written by: Zijian Xie (R&D Manager, BEng, MSc)