The
first disk drives that I worked with were IBM 2311s. These were about the size
of a small washing machine. They had removable disk packs that were 14” in
diameter. With six platters (giving 10 recording surfaces since the top and
bottom were not usable) and 200 tracks per platter of about 7000 bytes, the
total capacity of a disk pack was about 7M. When these were upgraded to IBM
2314s which had several disk drives in a single large box, the capacity per
disk pack was increased to 28M. That seemed like a lot of storage space in
those days, but it was still only 20% of the capacity of a single reel of tape
at maximum density.
However,
the power of a disk drive is that we were no longer limited to sequential
files, i.e. files that must be read from one end to the other. Instead they
supported a common access method that was known as ISAM (Indexed Sequential
Access Method). These files could be read sequentially, or could be accessed
randomly based on a unique key assigned to each record (such as customer number
or employee number).
There
were three separate areas in each ISAM file – the index, the data area, and the
overflow area. The index area was pretty small and stored the highest key in
each cylinder (the 10 recorded tracks that were directly under each other on
different platters in the disk pack), then the highest key in each track and
for each track the highest key in each block. So with only a few I/Os, you
could locate the place that a record was (assuming that it existed). In order
to keep things in order, when a new record was added, it was inserted in the
appropriate place and the last record in that block would be pushed off into
overflow. Records in overflow were not blocked, but were stored individually
with pointers from each record to the next.
This
caused an interesting situation when I was working at Winchester. One morning
when I came into work I was immediately asked to help with a problem from a
program that had started running the prior evening and was still not completed.
It turned out that there were an extremely large group of records being added
to one of our master files and all the new records had new keys in the same
range, so they were all being inserted into the same area of the disk. Because
the input file was sorted in ascending order, each new record would initially
get assigned to the same block, but as the block was full, it would then begin
traversing the overflow chain, one record at a time. Eventually it would reach
the end of the chain and then write the new record. This was not a problem for
the first new record as the chain was only one record long, but by the 100th,
the chain was 100 records long, then 101, 102, 103… It was taking the program
longer and longer to add each new record as the overflow chain got one record
longer for each new record being added.
I
was able to use some customized utility programs that I had written to (1)
examine a copy of the list of new records to be added, and (2) examine the
computer memory (while the program was running) to see which record it was up
to (by this time the program had been running for nearly 12 hours!) Making some
back-of-the-envelope calculations, I determined that the program would probably
be done in about 3-4 more days! I then examined the program and determined that
it would still operate properly if the new records were sorted in descending
order instead of ascending. This would mean that each new record would find its
“home” right at the beginning of the overflow chain instead of at the end. We
cancelled the program, restored the master file to what it was before the
program began running (thanks to some good design we had that option), sorted the
new records in descending order and restarted the program. It was done in less
than 10 minutes! I got a lot of kudos for working out that solution!
No comments:
Post a Comment