Having read the paper of ‘MS Compound Document File Format’ again, I created a sector map to demonstrate the parameters in the file header.
The green and blue parameters are the standard values. They are generic values except the revision number and the version number (which are normally not important at all). The red and yellow parameters are the critical ones which are used to construct the actual content in a spreadsheet.
• SAT Size (sec) : The number of sectors that used by the Sector Allocation Table (SAT).
• First SecID of DIR : The SecID of the first sector that stores the Directory table (DIR).
• First SecID of SSAT : The SecID of the first sector that stores the Short Sector Allocation Table (SSAT).
• Size of SSAT(sec) : The number of sectors that used by the SSAT.
• First SecID of MSAT : The SecID of the first sector that stores the Master Sector Allocation Table (MSAT).
• Size of MSAT : The number of sectors that used by the MSAT.
• SecIDs of MSAT : The SecID chain that stores all the SecIDs used by the MSAT.
Only the first 109 SecIDs of MSAT will be stored in the sector of file header. It will be padded by ‘0xFFFF’ if it has less than 109 SecIDs.
Based on the concepts above, I am trying to repair a corrupted spreadsheet document manually, instead of using repair software. Open the file, it shows the content are corrupted:
Use Winhex to view the file (beginning), it shows:
Not saying the rest of the file, it is quite obvious that the file header (Page 1) is damage completely. As we know, the standard sector size of OLE2 document is 512 bytes. We can interpret the file as a hard disk, which makes it easier to examine the data structure. In Winhex, this file starts from sector 0 to sector 2327. This sector number is different to ‘SecID’ described in OLE2 document specification. They have a relationship as the table below:
OLE2 | Header | SecID 0 | SecID 1 | Sec 2 | SecID 3 | … | SecID N |
Physical | Sector 0 | Sector 1 | Sector 2 | Sector 3 | Sector 4 | … | SecID N+1 |
To repair the sector 0 (Header), we can copy the content of sector 0 from a working .xls file to replace the current damage one. This action will repair the standard values. Obviously, the critical parameters mentioned above need to be adjusted or recalculated. Thus, the places where need changing are:
1. SAT Size (sec) : offset 0x2C to 0x2F
2. First SecID of DIR : offset 0x30 to 0x33
3. First SecID of SSAT : offset 0x3C to 0x3F
4. Size of SSAT(sec) : offset 0x40 to 0x43
5. First SecID of MSAT : offset 0x44 to 0x47
6. Size of MSAT : offset 0x48 to 0x4B
7. SecIDs of MSAT : offset 0x4C to 0x1FF
If the file size is not bigger than 6.8MB, the content of place 5 and place 6 do not need changed. This is because 109 double words pointers will be enough to store the whole SAT. Place 5 will be ‘0xFEFFFFFF’ (-2) and place 6 will be ‘0x00000000’.
Let’s start to repair the critical parameters by finding the SAT first. SAT is very similar to FAT tables in FAT32 file system. It contains a multiple chains of sector pointers. Also, this is usually stored at the beginning area. Looking at sector 1, this is a sector used by the SAT obviously. The first double word of ‘0xFDFFFFFF’ indicates the current sector is used to store the SAT. There are another two ‘0xFDFFFFFF’ at this sector, which indicate SecID5 and SecID6 are also used by SAT.
At SecID6 (sector 7), we found another two pointers have a value of ‘0xFDFFFFFF’. They are SecID 308 and SecID 309 respectively.
After a continuous searching, the SecIDs used by SAT are
SecID | Physical Sector |
0 | 1 |
5 | 6 |
6 | 7 |
308 | 309 |
309 | 310 |
613 | 614 |
614 | 615 |
615 | 616 |
917 | 918 |
918 | 919 |
1221 | 1222 |
1222 | 1223 |
1524 | 1525 |
1525 | 1526 |
1526 | 1527 |
1830 | 1831 |
1831 | 1832 |
2164 | 2165 |
2187 | 2188 |
There are 19 sectors used by the SAT. Thus, Size of SAT and First SecID of SAT can be modified to:
Size of SAT = 19 (0x13000000)
First SecID of SAT = 0x00000000
Also, the field of SecIDs of SAT can be changed according to the table above. There are less than 109 SecIDs being used by SAT, thus:
First SecID of MSAT = 0xFEFFFFFF
Size of MSAT = 0x00000000
Directory table is always started with ‘ROOT ENTRY’. We search for a string of ‘ROOT ENTRY’ in UNICODE, and found sector 2 is used to store the DIR. Thus the First SecID of DIR is 0x01000000.
Regarding the SSAT, it will have a similar data structure as the SAT. But normally, it will have a shorter size in the length of the table. Checking through the first few sectors of the file, I found sector 3 should be the first sector of SSAT.
Thus, the First SecID of SSAT is 0x02000000. To check the next SecID from the SAT, we found a value of 0xFEFFFFFF at the offset of SecID 2, which indicates that the current SecID is the end of the current SID chain. In another word, the Size of SSAT is just a single sector, which has a value of 0x01000000.
To replace the recalculated values in the file header, it will look like this:
Try to open the file and it is readable now.
After manually recovered the file, I was interested to see if the recovery software can do anything or not. I tried a demo version of Excel Recovery software. Finally, the preview shows nothing apart from the sheet names.
Written by: Zijian Xie (R&D Manager, BEng, MSc)