My son recently sent me a link to an article (http://catless.ncl.ac.uk/Risks/31/54/#subj31.1)
titled, “A lazy fix 20 years ago means the Y2K bug is taking down computers now”.
The first paragraph in this article read:
“Programmers wanting to avoid the
Y2K bug had two broad option: entirely rewrite their code, or adopt a quick fix
called ‘windowing’, which would treat all dates from 00 to 20, as from the
2000s, rather than the 1900s. An estimated 80 per cent of computers fixed in
1999 used the quicker, cheaper option.”
The article gave several examples of systems which failed
during the transition from 2019 to 2020 because of the windowing code. I’d like
to recount my own experience of dealing with the Y2K problem over 20 years ago,
why our company had very few Y2K problems, and the various problems that we did
experience.
Because of my membership in two professional societies (ACM
and IEEE-CS), I had read articles about Y2K earlier than most people. In 1980,
I designed and oversaw the creation of a standardized date subroutine that
handled every type of date manipulation we could think of – date comparison,
number of days between dates, date conversion from one format to another, etc.
Since we had to be able to deal with dates with two-digit years, we did have to
make assumptions about what century digits to assume and we chose 1940-2039 as
our “window”. We chose 1940 since that was the year of the company’s founding
and it would be unlikely that we would encounter any dates prior to that in
most situations (the only exception was personnel because people had birth
years prior to 1940, but those were stored as four-digit years).
We knew that by doing so we would have 60 years before our
window would cause issues. But we took additional precautions, one being that
we ensured that our object code could NOT be combined into any other program.
This way, if we ever needed to update the subroutines, we could do so and
ensure that no one was using an “old” version of the code.
We mandated that any program needing date manipulation use this
subroutine. Thus, our main concerns were (1) programs which had been written
before 1980 which would still be used after 2000, (2) purchased software, and
(3) programs which did no date manipulation but still had dates in them.
About 10 years later, in 1990, with an increasing number of
articles being written about the upcoming date rollover, I started asking that
management address the problem. It took a few years, but eventually around 1996-1997
I received permission to investigate one system, make any necessary changes,
and document what would have happened if the changes had not been made.
I chose our corporate accounts receivable system because is
was in category (2), i.e. the original source code was a purchased package.
After doing a thorough investigation and fixing any problems related to the year
rollover, I presented my report to management. They were shocked!
Among the problems were (1) the balance on all accounts
would be flagged as past due, interest charged, and past due notices mailed to
every customer, and (2) any payments made would be rejected as having invalid
dates. That got management’s attention!
In fairly short order we created a project team to address
the issue. We purchased some software that could examine COBOL programs for any
date usage and flag any potential issues. Then we fixed and tested everything
that we needed to change. Programs written in other languages were scanned by
hand. In the process we found that there were a couple of old programs where we
no longer had the source code, so we tested the function of the program. I was
responsible for checking all programs in our Gases Group as well as coordinating
all the checking in our joint ventures and subsidiaries around the world.
On the night of the date rollover, I was on duty as the date
change worked its way around the world. In every time zone we shut down our
operating plants before midnight as a precaution, then restarted them after
midnight and local operators called back to our headquarters to report any
problems. I was also in charge of logging any errors both during the run-up to
the date rollover (some programs do date lookahead), and in the weeks that
followed (as some programs only ran monthly).
I logged over 60 errors during this period. All but two were
fairly simple ones of just a few types. One type was where the output field for
a date had a format code of “ZZ” or “Z9” so the leading zero was suppressed the
date was printed as “_0” instead of “00”. Another type was where a “19” was
hardcoded to print on a report so it printed “1900” instead of “2000”. There
just a few where both types coincided and the report read “19_0”. These did not
cause major issues and were easily fixed later.
There were only two errors where there was any significance
consequence:
One of these was in our executive payroll system (yes, senior
executives get special treatment). We had not been allowed to examine the
source code in this system because it was labeled as company confidential. The
person in charge of that system told us that he had tested it, but not being a
part of the team working on the project he was not as thorough. As a result,
during the date rollover the system took several payroll deductions twice and
short paid all our executives. This was embarrassing as we had to credit the
executives for the double deduction, but it only affected a few individuals and
the credit was done before they received their checks for the pay period.
The other problem happened at one our locations in Liberal,
Kansas. The US Government runs a helium extraction plant in the panhandle of
Texas and runs a helium pipeline across the Oklahoma panhandle into Kansas. Our
company took helium out of the pipeline and there was an IBM PC which monitored
the flow and periodically calculated the amount of product taken over an
elapsed period and billed the company for the helium. Because that PC was owned
by the US Government, we not allowed to examine or test it for any Y2K issues.
During the rollover period the difference between the date/time at the
beginning of the period and the date/time at the end of the period was negative
instead of positive. Thus, instead of billing the company for the helium taken,
we were given a credit. The amount was relatively small, so the credit was
processed as normal and no effort was made to correct the mistake.
Not too many years later the company moved to using SAP and
all the old programs on our mainframe were eliminated, so there would not be
any additional Y2K issues in code under our control. Thus, we did in fact
eliminate all those old programs before the expiration of the 1940-2039 window.
But there are still many companies running some old programs and their 1920-2019
windows are now giving them problems.
I know that the programmers working on these old systems
that are now failing are not enjoying the problems that they have to fix. If
they’d taken the time, as our company did, to fix them right the first time
then they could have avoided these problems.
No comments:
Post a Comment