Apollo 11 landed on the moon 40 years ago tomorrow. NASA Software Engineer Don Eyles explains how they beat a computer glitch that could have ended the mission
By Don Eyles, July 18, 2009 from MIT News
Eyles was a software engineer at MIT’s Instrumentation Laboratory, which then became Draper Laboratory, from 1966 until 1998. He wrote onboard guidance software for the lunar descent, and during the Apollo 14 mission he devised a “workaround” for a hardware problem with the Abort switch that could have jeopardized the mission. He subsequently wrote software for the space shuttle and the International Space Station. This recollection of the events during the Apollo 11 landing was written for the Boston Globe in 1989.
My mind screamed for an abort.
An amber light labeled “PROG” had just flashed in the lunar module cockpit, at that moment less than six miles above the moon’s surface Astronaut Edwin E. (Buzz) Aldrin quickly interrogated the computer. It was alarm number 1202, the “mayday” of an overloaded computer.
Never had we contemplated continuing a lunar landing after a 1202 alarm. We have to abort, I thought.
In Houston, flight controllers didn’t know exactly what the alarm meant, but they knew the lunar module was still flying on the correct trajectory. They told Neil Armstrong and Aldrin to press on.
We were monitoring the mission from a classroom at the MIT Instrumentation Laboratory in Cambridge. Six-inch thick books containing the programs for the flight computers lay open on long tables. Open telephone lines linked us to mission control in Houston. A squawk box barked out the conversation between the spacecraft and the ground.
The room was crowded. If you had worked on any part of the guidance system, you were there. President Kennedy had dared us to put a man on the moon by 1969. This was the day it might happen.
When the countdown began for the rocket burn that would begin the descent, Allan Klumpp and I had edged closer to the loudspeakers. The others gave way because it was our turn. We had worked together for three years, night and day, to produce computer programs to guide the spacecraft to the lunar landing. We had watched while guidance programs written for other phases of the Apollo mission had done their job. Now our work would be tested.
Our job was to guide the lunar module during the landing, the complex 12 minutes of flight that would end when the “lunar contact” light came on in the cockpit. Klumpp had designed the guidance equations. I had programmed them into the onboard guidance computer.
Perfection is excluded
George Cherry, our boss, liked to say, “The perfect is the enemy of the good.” He was reflecting the views of our customer, the National Aeronautics and Space Administration, where software designers were apparently regarded as “prima donnas” who could not quit fiddling and let the product go out the door. Unfortunately, it was all too clear that perfection was out of the question.
For the Apollo guidance computer, the landing was by far the busiest period of an Apollo mission. The spacecraft had to be slowed from orbital velocity of 3,800 miles per hour to a landing speed of about 2 m.p.h.
The guidance system had to make radar contact with the surface to correct navigational errors caused by the moon’s gravitational field. The system allowed the astronauts to shift the landing site if necessary. These “jobs” involved calculations that had to be repeated frequently. There was barely enough computer time to get it all done. We worried about fractions of milliseconds.
Until now, the mission had gone normally. The launch was smooth and the long coast to the moon uneventful. The lunar module separated from the command module and performed a short rocket burn to drop to about nine miles above the lunar surface.
On the computer keyboard, Aldrin selected the landing program. The guidance equations computed the optimum time to start the landing burn, and maneuvered the lander for ignition. The countdown began. At five seconds before light-up time, Aldrin punched the “proceed” button.
The first alarm came five minutes later, near the point where the load on the computer increased as it began to process readings from the landing radar. The alarm meant that the computer had run out of the temporary storage areas it used to prevent unrelated tasks from interfering with one another.
Glitch in a computer
When a job ended in the computer, it was supposed to release its storage area. Evidently, more jobs were beginning than were ending. Something was wrong in the guidance computer.
Forty seconds later, another 1202 alarm flashed. Again, mission control told the astronauts to press on. It seemed to be a capacity problem. Something was stealing computer time.
Was it vibration?
We had built a time margin into the computer programs because the circuits that converted data from the sensors monitoring the lander’s flight into a form usable by the computer operated by pre-empting computer time. We had worried that vibration might flood the computer with so much data from the sensors that it would squeeze the time available for the computations necessary to keep the lander on course.
We had stopped worrying about vibration after the successful flights of the lunar module on Apollo 9 and Apollo 10. Now the vibration scare was back, with a vengeance.
Minutes passed without another alarm. Then, as planned, the lunar module pitched forward so the astronauts could see the landing spot. At this point, the equations that allowed the astronauts to shift the landing site were turned on, imposing a further load. Then, about 20 seconds apart, three more alarms flashed.
There was little comfort for us in knowing what was happening after each alarm. The computer was flushing away and then reconstructing its entire schedule of computations. This feature had been added to insure that a fluctuation in the computer’s electrical power, known as a restart, would not suddenly place the computer in an idle state that resembled brain death.
A perilous procedure
During simulations we used to casually push the restart button to stress test the program, but it was a perilous procedure no one wanted to try in flight, especially during a landing.
Now, as it happened in real life, we knew that each time, some of the commands issued by the guidance equations were being flushed away. But enough were getting through to the lander’s guidance rockets to keep it on course.
Finally, at an altitude of about 500 feet, Armstrong took control, using a mode in which the computer maintained the spacecraft’s rate of descent while Armstrong manually guided it toward a touchdown spot. With his brain and instincts now doing some of the work, the computational load on the computer lessened. The alarms ceased.
Armstrong maneuvered to avoid a field of boulders and put the spacecraft down with enough fuel left for about half a minute of flight. Later, he apologized for a “spastic descent trajectory.” He had changed his mind three times while looking for a landing spot. The alarms had distracted him.
A second after touchdown, the spacecraft communicator in Houston told Armstrong and Aldrin, “You’ve got a bunch of guys about to turn blue.”
In Houston and in Cambridge, we were taking our first normal breath in eight minutes. Allan Klumpp held out a big hand to me and said, “We did it” – as though the lunar landing had been our private accomplishment.
But it was too soon to celebrate. The lander, and its crew, still had to get off the moon. The ascent phase of the mission demanded less from the computer than the landing, but that provided little reassurance as long as the cause of the computer problem was unknown.
It was later that evening that George Silver, a hardware specialist whose job was to test the guidance hardware before launch, remembered something. The radar needed for rendezvous with the command module orbiting above the moon was not used during the landing, but under certain conditions its electronic link with the computer could still be active. If so, it might steal as much as 15 percent of the computer’s time.
The next day, it turned out that the rendezvous radar switch in the lander cockpit had in fact been in the “auto-track” position even though the radar was not being used. It was the very condition that could cause the problem.
For weeks afterward we endured news stories that blamed a computer malfunction for jeopardizing the landing. Patiently we explained that the computer actually saved the day by adapting to the overloading as it occurred. The real malfunction was in the checklist – the manual that specified the proper positions of the spacecraft’s switches for landing.
The problem had not been detected before flight because none of the Apollo simulators duplicated so obscure a characteristic of the lander’s electronics.
Were we lucky or were we smart? The alarms were few, due to a time margin we had built into the programs because of an exaggerated fear of vibration. Once the alarms occurred, the computer was able to continue because of software features that we had added to get around fluctuations in the power supply, a problem that occurred only once during the Apollo project, when a fuel cell exploded aboard Apollo 13.
The Apollo 11 mission was saved by capabilities that were put there for the wrong reason.
We were lucky. But we were also humble. We knew that we didn’t know everything. We knew that an Apollo mission was too complex for every problem to be anticipated. We had met the astronauts and felt a personal responsibility to them.
If we were in doubt about the seriousness of a problem, we fixed it anyway. We had earned our good luck.
Related – Apollo Guidance Computer
PGNCS trouble
PGNCS generated unanticipated warnings during Apollo 11’s lunar descent, with the AGC showing a 1201 alarm (“Executive overflow – no vacant areas”) and a 1202 alarm (“Executive overflow – no core sets”).[8] The cause was a rapid, steady stream of spurious cycle steals from the rendezvous radar, intentionally left on standby during the descent in case it was needed for an abort.[9][10]
During this part of the approach the processor would normally be almost 85% loaded. The extra 6400 cycle steals per second added the equivalent of 13% load, leaving just enough time for all scheduled tasks to run to completion. Five minutes into the descent Buzz Aldrin gave the computer the command 1668 which instructed it to calculate and display DELTAH (the difference between altitude sensed by the radar and the computed altitude). This added an additional 10% to the processor workload causing executive overflow and a 1202 alarm. After being given the “GO” from Houston Aldrin entered 1668 again and another 1202 alarm occurred. When reporting the second alarm Aldrin added the comment “It appears to come up when we have a 1668 up”. Happily for Apollo 11, the AGC software had been designed with priority scheduling. Just as it had been designed to do, the software automatically recovered, deleting lower priority tasks including the 1668 display task, to complete its critical guidance and control tasks. Guidance controller Steve Bales and his support team that included Jack Garman issued several “GO” calls and the landing was successful. For his role, Bales received the US Medal of Freedom on behalf of the entire control center team and the three Apollo astronauts.[11]
The problem was not a programming error in the AGC, nor was it pilot error. It was a peripheral hardware design bug that was already known and documented by Apollo 5 engineers.[12] However because the problem had only occurred once during testing they concluded that it was safer to fly with the existing hardware that they had already tested, than to fly with a newer but largely untested radar system. In the actual hardware, the position of the rendezvous radar was encoded with synchros excited by a different source of 800 Hz AC than the one used by the computer as a timing reference. The two 800 Hz sources were frequency locked but not phase locked, and the small random phase variations made it appear as though the antenna was rapidly “dithering” in position even though it was completely stationary. These phantom movements generated the rapid series of cycle steals. – Wikipedia
runescape gold
I hope Buzz isn’t losing his mind. The whole space program is a boondoggle and has to be scrapped. It is 100% useless and a waste of taxpayers’ money.
Andrew
It is all fascinating, but I am just listening to a fuller audio recording than I have heard before from this site http://www.klabs.org/history/apollo_11_alarms/console/ and I hear the voice of Steve Bales, “if it doesn’t recur we will be go…” on the 12:02 alarms and know about the story of the alarms so recognise the voice of Steve Bales on the recording , but all the time he is relaying exactly what another guy is saying, presumably in the back room which has been added to the recording so you can hear all the conversation on the ground, This guy seems to know everything about the situation and Steve Bales is all the time relaying what he is saying but seems more unsure about the situation himself … …. so i would like to know what is the other guy’s name… he is definitely an unsung hero in that case! You can hear his voice coming over on the RH channel of the apollo_11_dsc.mp3 recording on the above site…. Listen for yourself..