Lecture 23 - Directories, Historical Filesystems

For the filesystem, specifically reading indirect zones:

Directories

What goes within the directories? Normally we want:

Let's look at some structures. First the historical ones:

CPM

CPM (Control Program Monitor): A popular '70s OS that had a flat directory structure. It had one table of all the entries. This worked back then since you only had a 512KB disk so if your block size was 1K you could only have 500 files (and if they're big files, even less). Each entry had:

Notice that there is no filesize listed. This lead to the convention of the file itself containing the EOF character to dictate where the file size is.

What about big files that used all their blocks? These files got multiple directories that were indexed by the extent byte.

MS-DOS FAT-16

The IBM PC needed an OS, so IBM called the CPM people who didn't call them back. In response, they called Bill Gates to make one for them into what is known as MS-DOS.

This had:

Why the 10 reserved bytes? Well for one it's Microsoft (they are resource hogs). The other thing is that the total bits is now 32, a multiple of 2.

The firstBlock is an index into FAT. This is all similar to FAT-32.

They also called directories as folders, which are a much better name.

Filesystem Performance

Everything we talked about page size applies to block size. To make our filesystem more performant, think about what we're doing:

Conventially there is no requirement that the inode indices for blocks are far away from the data addresses themselves. What if they were close together?

But the problem with this is that as you fill it up, you'll lose performance since rings will start to buffer other rings (more data means that adjacent inodes are now farther apart). How can we remedy this? Using buffer caching, we can hold recently used blocks in case we reuse them. With this:

If you pull the plug make sure to:

for(;;){
	sleep(30);
	sync();
	sync();
	sync();
}

Log Structured Filesystem (1991)

The idea is that you accumulate changes and log those, then when full dump all the changes onto the disk:

Note that:

The idea is that you reconstruct the filesystem by using the disk changes, and where the filesystem started (think about instead of counting time by how much time passed since last Thursday, it's 30 years ago and you are using the changes to create the new FS). Here, you can undo the parts that you have done so that you can remove these entries from the log.

Filesystem Reliability

We care a lot more about the disk drives not failing compared to something like a CPU. The most important thing is that it can backup. But as people we are terrible at doing this manually. Do it automatically and check that it works.

But alternatively, make it so that you don't care, namely RAID (redundant array of indepdendent disks). Have 3 disks and a controller. Everytime you write a disk block to a disk you write to 2 of them. Then when one fails then the controller uses the other two disks to reconstruct what was in the disk.

Drawbacks: