Friday, 30 July 2010

SQL Defragmentation Technique






The nature of an SQL database file

A database file is a data storage centre that consists of an organized collection of data for one or multiple users. The size of this data centre grows as the user's gradually introduce new data into it. When a database file is first created, the operating system will assign an area of the hard disk or RAID large enough to store this file. Due to the nature of the database, this initial area will not be large enough for the gradually rising amount of data. The operating system will then assign a new area to store the new data belonging to the database. Obviously, this operation will be continued as typically, the user(s) are adding to the database everyday. As a result, file fragments are generated.


Why existing methods of recovery do not work

As we explained in the last article, when a database file is accidentally deleted, the metadata of this file is deleted at the same time. The fragments belonging to this deleted database will become anonymous without the associated metadata. To recover this file, the IT manager/DR technician will typically use a conventional recovery application. However, these applications are designed to retrieve files that have been accidentally deleted - they will be useless when being used to recover fragmented files, especially *.mdf file fragments.


MDF Page Structure and Deframentation Technique

MDF file uses a standard database file format to store the tables and contents. The minimum storage unit is a Page. A page is an integer multiple of sectors. The number of sectors in a page is determined by different database application. For example, a Microsoft MDF file has 16 sectors in a page. So the page size is 8KB. Within a MDF file, it has a page structure as shown below:

Page Sector Offset

Page ID

0

Page ID = 0

16

Page ID = 1

32

Page ID = 2

48

Page ID = 3

64

Page ID = 4

16*N

Page ID = N



MDF pages will be arranged in a sequential order based on the Page ID stored in a page header within a MDF file. Obviously, pages will be fragmented because of the nature of a database file as mentioned above. The SQL Defragmentation Technique is going to locate each piece of fragments by looking at the Page IDs in the page headers and then resemble them in a correct order.


Written by: Zijian Xie (R&D Manager, BEng, MSc)








File Fragmentation and Defragmentation

After a partition and a formation on a computer hard drive, all the sectors in user data area belong to this partition are marked as ‘Not in Use’. A number of new files will be introduced to and stored on these usable sectors sequentially and consecutively. A sequence of user data sectors are marked as ‘Occupied’ (coloured as shown in the figure below).

File0

File1

File2

File3

File4

File5

When some of the files are deleted or removed, the corresponding sector will become ‘Not in Use’ again. In this example, File0, File3 and File5 are deleted for demonstration.

File1

File2

File4

Another new file of File6 is introduced and stored on sectors that marked as ‘Not in use’. As a result, File6 has three pieces of fragments.

File6

File1

File2

File6

File4

File6

File6 is deleted accidentally by user, what happens to the sector map is the fragments belong to File6 are marked as ‘Not in Use’ (as in grey) again:

File6

File1

File2

File6

File4

File6

To recover this deleted File6, the recovery application locates the starting sector of this file by looking at the file header. But it will always assume that this file was stored consecutively without fragments. As a result, the recovered File6 is going to be looked like this:

Recovered File6

File6

File4

File6

Obviously, the recovered File6 contains the first fragment of File6 and content from File1 and File2. Even though this file is opened, it will be corrupted.

To use a conventional recovery application to recover the lost/deleted files from hard drives, it is quite common that the files are corrupted or partial corrupted when they are opened. For example, a JPEG is partially corrupted, a word document is unable to open due to data corruption or it is opened but with all unknown characters and so on. This problem will become severed and fatal when the lost file type is database format (MS SQL, MS Access, Oracle, etc.) .

All these kinds of lost files mentioned above are deemed as unrecoverable files, because:

· 1. Data sectors are overwritten after file deletion by introducing new files;

· 2. In FAT32 file system hard drive, the MSB (Most Significant Bytes) of current file address are cleared to zero after its deletion. Without knowing the precise starting address of the deleted file, the recovery software will just assume the MSB is zero. Even though the recovered file has its original name and correct size base on this recovery algorithm, the file is still found to be corrupted after opening. Some advanced file recovery application has tried to solve this problem. But it is only working when the lost file has NO fragment on hard drive.

· 3. Generally speaking, database files are stored in discrete sector areas as it grows every day. All the recovery applications assume that the files are stored in a successive and linear area on a hard drive. It is because they are not able to determine and find all the discrete fragments that belong to a deleted file.

File Defragmentation (FD) technology is proposed in this article. File defragment is a term used to describe a process to search for all the fragments belong to a same file and reconstruct this file using the fragments found. Using this technology on the example above, the recovered File6 should look like this:

File6-1

File1

File2

File6-2

File4

File6-3

File6-1

File6-2

File6-3

As a result, the deleted File6 is recovered successfully and it is 100% intact. If any fragments belong File6 have been overwritten, even the FD technology will not help anymore. This is simply because data has been magnetically overwritten or removed.


Written by: Zijian Xie (R&D Manager, BEng, MSc)