Thursday, January 30, 2020

Y2K Bites Again


My son recently sent me a link to an article (http://catless.ncl.ac.uk/Risks/31/54/#subj31.1) titled, “A lazy fix 20 years ago means the Y2K bug is taking down computers now”. The first paragraph in this article read:

“Programmers wanting to avoid the Y2K bug had two broad option: entirely rewrite their code, or adopt a quick fix called ‘windowing’, which would treat all dates from 00 to 20, as from the 2000s, rather than the 1900s. An estimated 80 per cent of computers fixed in 1999 used the quicker, cheaper option.”

The article gave several examples of systems which failed during the transition from 2019 to 2020 because of the windowing code. I’d like to recount my own experience of dealing with the Y2K problem over 20 years ago, why our company had very few Y2K problems, and the various problems that we did experience.


Because of my membership in two professional societies (ACM and IEEE-CS), I had read articles about Y2K earlier than most people. In 1980, I designed and oversaw the creation of a standardized date subroutine that handled every type of date manipulation we could think of – date comparison, number of days between dates, date conversion from one format to another, etc. Since we had to be able to deal with dates with two-digit years, we did have to make assumptions about what century digits to assume and we chose 1940-2039 as our “window”. We chose 1940 since that was the year of the company’s founding and it would be unlikely that we would encounter any dates prior to that in most situations (the only exception was personnel because people had birth years prior to 1940, but those were stored as four-digit years).

We knew that by doing so we would have 60 years before our window would cause issues. But we took additional precautions, one being that we ensured that our object code could NOT be combined into any other program. This way, if we ever needed to update the subroutines, we could do so and ensure that no one was using an “old” version of the code.

We mandated that any program needing date manipulation use this subroutine. Thus, our main concerns were (1) programs which had been written before 1980 which would still be used after 2000, (2) purchased software, and (3) programs which did no date manipulation but still had dates in them.

About 10 years later, in 1990, with an increasing number of articles being written about the upcoming date rollover, I started asking that management address the problem. It took a few years, but eventually around 1996-1997 I received permission to investigate one system, make any necessary changes, and document what would have happened if the changes had not been made.

I chose our corporate accounts receivable system because is was in category (2), i.e. the original source code was a purchased package. After doing a thorough investigation and fixing any problems related to the year rollover, I presented my report to management. They were shocked!

Among the problems were (1) the balance on all accounts would be flagged as past due, interest charged, and past due notices mailed to every customer, and (2) any payments made would be rejected as having invalid dates. That got management’s attention!

In fairly short order we created a project team to address the issue. We purchased some software that could examine COBOL programs for any date usage and flag any potential issues. Then we fixed and tested everything that we needed to change. Programs written in other languages were scanned by hand. In the process we found that there were a couple of old programs where we no longer had the source code, so we tested the function of the program. I was responsible for checking all programs in our Gases Group as well as coordinating all the checking in our joint ventures and subsidiaries around the world.

On the night of the date rollover, I was on duty as the date change worked its way around the world. In every time zone we shut down our operating plants before midnight as a precaution, then restarted them after midnight and local operators called back to our headquarters to report any problems. I was also in charge of logging any errors both during the run-up to the date rollover (some programs do date lookahead), and in the weeks that followed (as some programs only ran monthly).

I logged over 60 errors during this period. All but two were fairly simple ones of just a few types. One type was where the output field for a date had a format code of “ZZ” or “Z9” so the leading zero was suppressed the date was printed as “_0” instead of “00”. Another type was where a “19” was hardcoded to print on a report so it printed “1900” instead of “2000”. There just a few where both types coincided and the report read “19_0”. These did not cause major issues and were easily fixed later.

There were only two errors where there was any significance consequence:

One of these was in our executive payroll system (yes, senior executives get special treatment). We had not been allowed to examine the source code in this system because it was labeled as company confidential. The person in charge of that system told us that he had tested it, but not being a part of the team working on the project he was not as thorough. As a result, during the date rollover the system took several payroll deductions twice and short paid all our executives. This was embarrassing as we had to credit the executives for the double deduction, but it only affected a few individuals and the credit was done before they received their checks for the pay period.

The other problem happened at one our locations in Liberal, Kansas. The US Government runs a helium extraction plant in the panhandle of Texas and runs a helium pipeline across the Oklahoma panhandle into Kansas. Our company took helium out of the pipeline and there was an IBM PC which monitored the flow and periodically calculated the amount of product taken over an elapsed period and billed the company for the helium. Because that PC was owned by the US Government, we not allowed to examine or test it for any Y2K issues. During the rollover period the difference between the date/time at the beginning of the period and the date/time at the end of the period was negative instead of positive. Thus, instead of billing the company for the helium taken, we were given a credit. The amount was relatively small, so the credit was processed as normal and no effort was made to correct the mistake.

Not too many years later the company moved to using SAP and all the old programs on our mainframe were eliminated, so there would not be any additional Y2K issues in code under our control. Thus, we did in fact eliminate all those old programs before the expiration of the 1940-2039 window. But there are still many companies running some old programs and their 1920-2019 windows are now giving them problems.

I know that the programmers working on these old systems that are now failing are not enjoying the problems that they have to fix. If they’d taken the time, as our company did, to fix them right the first time then they could have avoided these problems.


No comments:

Post a Comment