Friday, 6 August 2010

Microsoft Excel Document Repair

Having read the paper of ‘MS Compound Document File Format’ again, I created a sector map to demonstrate the parameters in the file header.



The green and blue parameters are the standard values. They are generic values except the revision number and the version number (which are normally not important at all). The red and yellow parameters are the critical ones which are used to construct the actual content in a spreadsheet.

• SAT Size (sec) : The number of sectors that used by the Sector Allocation Table (SAT).

• First SecID of DIR : The SecID of the first sector that stores the Directory table (DIR).

• First SecID of SSAT : The SecID of the first sector that stores the Short Sector Allocation Table (SSAT).

• Size of SSAT(sec) : The number of sectors that used by the SSAT.

• First SecID of MSAT : The SecID of the first sector that stores the Master Sector Allocation Table (MSAT).

• Size of MSAT : The number of sectors that used by the MSAT.

• SecIDs of MSAT : The SecID chain that stores all the SecIDs used by the MSAT.

Only the first 109 SecIDs of MSAT will be stored in the sector of file header. It will be padded by ‘0xFFFF’ if it has less than 109 SecIDs.

(Note: If you need more explanations, please refer to http://sc.openoffice.org/compdocfileformat.pdf) )

Based on the concepts above, I am trying to repair a corrupted spreadsheet document manually, instead of using repair software. Open the file, it shows the content are corrupted:

Use Winhex to view the file (beginning), it shows:


Not saying the rest of the file, it is quite obvious that the file header (Page 1) is damage completely. As we know, the standard sector size of OLE2 document is 512 bytes. We can interpret the file as a hard disk, which makes it easier to examine the data structure. In Winhex, this file starts from sector 0 to sector 2327. This sector number is different to ‘SecID’ described in OLE2 document specification. They have a relationship as the table below:

OLE2

Header

SecID 0

SecID 1

Sec 2

SecID 3

SecID N

Physical

Sector 0

Sector 1

Sector 2

Sector 3

Sector 4

SecID N+1

To repair the sector 0 (Header), we can copy the content of sector 0 from a working .xls file to replace the current damage one. This action will repair the standard values. Obviously, the critical parameters mentioned above need to be adjusted or recalculated. Thus, the places where need changing are:

1. SAT Size (sec) : offset 0x2C to 0x2F

2. First SecID of DIR : offset 0x30 to 0x33

3. First SecID of SSAT : offset 0x3C to 0x3F

4. Size of SSAT(sec) : offset 0x40 to 0x43

5. First SecID of MSAT : offset 0x44 to 0x47

6. Size of MSAT : offset 0x48 to 0x4B

7. SecIDs of MSAT : offset 0x4C to 0x1FF

If the file size is not bigger than 6.8MB, the content of place 5 and place 6 do not need changed. This is because 109 double words pointers will be enough to store the whole SAT. Place 5 will be ‘0xFEFFFFFF’ (-2) and place 6 will be ‘0x00000000’.

Let’s start to repair the critical parameters by finding the SAT first. SAT is very similar to FAT tables in FAT32 file system. It contains a multiple chains of sector pointers. Also, this is usually stored at the beginning area. Looking at sector 1, this is a sector used by the SAT obviously. The first double word of ‘0xFDFFFFFF’ indicates the current sector is used to store the SAT. There are another two ‘0xFDFFFFFF’ at this sector, which indicate SecID5 and SecID6 are also used by SAT.

At SecID6 (sector 7), we found another two pointers have a value of ‘0xFDFFFFFF’. They are SecID 308 and SecID 309 respectively.

After a continuous searching, the SecIDs used by SAT are

SecID

Physical Sector

0

1

5

6

6

7

308

309

309

310

613

614

614

615

615

616

917

918

918

919

1221

1222

1222

1223

1524

1525

1525

1526

1526

1527

1830

1831

1831

1832

2164

2165

2187

2188


There are 19 sectors used by the SAT. Thus, Size of SAT and First SecID of SAT can be modified to:

Size of SAT = 19 (0x13000000)

First SecID of SAT = 0x00000000

Also, the field of SecIDs of SAT can be changed according to the table above. There are less than 109 SecIDs being used by SAT, thus:

First SecID of MSAT = 0xFEFFFFFF
Size of MSAT = 0x00000000

Directory table is always started with ‘ROOT ENTRY’. We search for a string of ‘ROOT ENTRY’ in UNICODE, and found sector 2 is used to store the DIR. Thus the First SecID of DIR is 0x01000000.

Regarding the SSAT, it will have a similar data structure as the SAT. But normally, it will have a shorter size in the length of the table. Checking through the first few sectors of the file, I found sector 3 should be the first sector of SSAT.


Thus, the First SecID of SSAT is 0x02000000. To check the next SecID from the SAT, we found a value of 0xFEFFFFFF at the offset of SecID 2, which indicates that the current SecID is the end of the current SID chain. In another word, the Size of SSAT is just a single sector, which has a value of 0x01000000.

To replace the recalculated values in the file header, it will look like this:

Try to open the file and it is readable now.

After manually recovered the file, I was interested to see if the recovery software can do anything or not. I tried a demo version of Excel Recovery software. Finally, the preview shows nothing apart from the sheet names.


Written by: Zijian Xie (R&D Manager, BEng, MSc)

3 comments:

  1. Given trouble is quite simple there are some programs repair excel file, which are able to forget about this complicacy due to its effective resources how to repair Excel file, for almost every trouble related to excel data corrupting or losing excel workbook repair.

    ReplyDelete
  2. If you are looking for a way to repair corrupt .xlsx file, you can try "Open and Repair" function of MS Excel. It is very simple and effective.

    But if fails, you may try a third-party Excel File Recovery to fix corrupt .xlsx file and to recover your data from it.

    You may try its free demo version first.

    Visit- http://www.recoverydeletedfiles.com/excel-file-recovery-tool.html

    ReplyDelete

  3. If your MS Excel File get corrupted due to virus attack, missing MS office files, power failure and other possible reason that can be fixed by only use any third party Excel Recovery Tool tool. I Suggest to you Excel File Recovery Tool, because it is integrated with skillful interface that ensures repair your corrupted or damaged excel file with quick and full accuracy. To more details:- http://www.softmagnat.com/excel-recovery.html

    ReplyDelete