Data recovery chances: The filesystems of Linux

To understand if the lost files can be restored, it is important to be aware what happens to them in the process of deletion or formatting. Various filesystems have different approaches to completing these operations. And that is exactly why the likelihood of success depends so heavily on the filesystem type used on the affected storage device.

Linux supports a wide variety of formats. But since most of them are designed to address specific problems, we will explore only such prevalent types as Ext2, Ext3/Ext4 and XFS.

Ext2

Every file and directory in Ex2 is represented by an individual indexing structure called an inode. Each inode can be addressed by its unique number. Besides such critical file's characteristics, as its size, inodes contain links to the blocks that store its actual data.

However, the names of files in Ex2 are not present in inodes. They can be found solely in directories. Directories associate these names with the corresponding inodes. They are themselves just ordinary files comprising lists of names and inode numbers.

The entire storage space is spit into sections called Block Groups. Each Block Group stores inodes in its own Inode Table. It also maintains the Block Bitmap and the Inode Bitmap to know which of its blocks and inodes are currently in use.

Deletion

Procedure: Ext2 labels the file's inode as free and updates the Block and the Inode Bitmaps. It also deletes the file's name and inode number from the directory.

Recovery: The information about the file's size and location of its content is still available in the inode. Thanks to it, there are great chances to bring the file back. On the other hand, the link to its name in the directory entry has been destroyed, so the correct name is most likely irrecoverable.

Formatting

Procedure: Ext2 clears the current Block Groups and deletes the inodes.

Recovery: No information is left in the filesystem that could be of assistance for the recovery of lost files. Yet, if blocks with their content haven't been reused yet, there is a chance to bring the files back. A data recovery utility can ignore the filesystem structures and examine the storage on a lower level. It will search for files based on the specific data that is known to be present in files of common formats. This technique is also called RAW data recovery. Unfortunately, it doesn't allow obtaining files with their initial names and directories. Moreover, it won't reconstruct the files stored in non-neighbouring blocks, and as a result, fragmented files will most probably be lost for good.

Ext3/Ext4

Ext3 is basically an improved version of Ext2. Its main strength is the use of journaling. The Journal resides in a special area within the file system. Before writing any changes, Ext3 collects all the blocks to be modified, creates their copy and saves the updated version to the Journal. Only then it can apply these changes.

Ext4 relies on the Journal as well. Furthermore, it adds the support for special structures used to store the content of files. These are referred to as extents - continuous ranges of blocks represented only by the starting block and the number of blocks that follow it.

Extents can be stored directly in the inode structure that describes the file. However, if a file occupies more than four extents, the rest of them are organized into a separate structure called a B+tree. The most notable thing about trees is that they do not place information sequentially. It is arranged on multiple levels that are connected to each other in a certain hierarchy.

Deletion

Procedure: Ext3/Ext4 creates an entry in the Journal and then wipes the inode associated with the file. The information about the file's name remains in the directory.

Recovery: Deleted files can be restored with the help of the Journal, even with their original names. Yet, the result may be incomplete if the filesystem has been used for a long time after deletion took place.

Formatting

Procedure: All the existing Block Groups and inodes are wiped. The Journal is cleared as well, but it may still contain the information about some lately created files.

Recovery: The lost files can be restored only by means of RAW data recovery. Yet, in case the blocks that hold their content are scattered around the storage, the chances for success decrease considerably.

XFS

XFS consists of identically sized parts called Allocation Groups. Each Allocation Group acts as though it were an independent filesystem.

Like in Ext4, the content of files is stored as extents, and the information about them, except their names - as inodes. The names of files exist only in the corresponding directories. Depending on the file's size, the locations of its extents may be found directly in the inode or be organized in a special B+tree structure.

Free extents in each Allocation Group are tracked using a pair B+trees. The first tree is helps to find the starting block of the contiguous free space region, and the second one - the number of blocks in this region.

A separate B+tree keeps record of the inodes in Allocation Groups as they are allocated and released.

XFS makes use of the Journal as well. It keeps changes to the filesystem metadata until they are written to the storage.

Deletion

Procedure: XFS excludes the inode associated with the deleted file from the B+tree. Most of the information about it gets overwritten. The file's name is removed from the directory. Yet, the extents that hold its content remain intact.

Recovery: The copies of metadata are stored in the Journal. They can be used to bring the lost file back. The chances for recovery are fairly high, even for its original name.

Formatting

Procedure: The B+trees responsible for space allocation get wiped. Also, a new root directory is created, which results in overwriting of the previous one.

Recovery: The files located closer to the start of the storage have poor chances to be restored. The rest of the data is likely to be recovered with success.

Read on to learn the chances for data recovery from other filesystems: