Tuesday, January 26, 2016

Disk Technology

The first disk drives that I worked with were IBM 2311s. These were about the size of a small washing machine. They had removable disk packs that were 14” in diameter. With six platters (giving 10 recording surfaces since the top and bottom were not usable) and 200 tracks per platter of about 7000 bytes, the total capacity of a disk pack was about 7M. When these were upgraded to IBM 2314s which had several disk drives in a single large box, the capacity per disk pack was increased to 28M. That seemed like a lot of storage space in those days, but it was still only 20% of the capacity of a single reel of tape at maximum density.

However, the power of a disk drive is that we were no longer limited to sequential files, i.e. files that must be read from one end to the other. Instead they supported a common access method that was known as ISAM (Indexed Sequential Access Method). These files could be read sequentially, or could be accessed randomly based on a unique key assigned to each record (such as customer number or employee number).

There were three separate areas in each ISAM file – the index, the data area, and the overflow area. The index area was pretty small and stored the highest key in each cylinder (the 10 recorded tracks that were directly under each other on different platters in the disk pack), then the highest key in each track and for each track the highest key in each block. So with only a few I/Os, you could locate the place that a record was (assuming that it existed). In order to keep things in order, when a new record was added, it was inserted in the appropriate place and the last record in that block would be pushed off into overflow. Records in overflow were not blocked, but were stored individually with pointers from each record to the next.

This caused an interesting situation when I was working at Winchester. One morning when I came into work I was immediately asked to help with a problem from a program that had started running the prior evening and was still not completed. It turned out that there were an extremely large group of records being added to one of our master files and all the new records had new keys in the same range, so they were all being inserted into the same area of the disk. Because the input file was sorted in ascending order, each new record would initially get assigned to the same block, but as the block was full, it would then begin traversing the overflow chain, one record at a time. Eventually it would reach the end of the chain and then write the new record. This was not a problem for the first new record as the chain was only one record long, but by the 100th, the chain was 100 records long, then 101, 102, 103… It was taking the program longer and longer to add each new record as the overflow chain got one record longer for each new record being added.


I was able to use some customized utility programs that I had written to (1) examine a copy of the list of new records to be added, and (2) examine the computer memory (while the program was running) to see which record it was up to (by this time the program had been running for nearly 12 hours!) Making some back-of-the-envelope calculations, I determined that the program would probably be done in about 3-4 more days! I then examined the program and determined that it would still operate properly if the new records were sorted in descending order instead of ascending. This would mean that each new record would find its “home” right at the beginning of the overflow chain instead of at the end. We cancelled the program, restored the master file to what it was before the program began running (thanks to some good design we had that option), sorted the new records in descending order and restarted the program. It was done in less than 10 minutes! I got a lot of kudos for working out that solution!

No comments:

Post a Comment