Pages

Thursday, January 30, 2020

A deep dive into the Apollo Guidance Computer, and the hack that saved Apollo 14 - Ars Technica

The source of all the trouble: the Abort pushbutton (along with its companion the Abort Stage pushbutton). This particular image is of the LM simulator currently residing at the <a href="https://www.cradleofaviation.org/">Cradle of Aviation Museum</a> in Long Island.
Enlarge / The source of all the trouble: the Abort pushbutton (along with its companion the Abort Stage pushbutton). This particular image is of the LM simulator currently residing at the Cradle of Aviation Museum in Long Island.
Frank O'Brien

Commanded by Alan Shepard, the only original Mercury astronaut to make it to the Moon on an Apollo mission, Apollo 14 was a reflight of Apollo 13's abandoned lunar landing plan. Accompanied by Lunar Module Pilot Ed Mitchell and Command Module Pilot Stu Roosa, Shepard's target was the Fra Mauro highlands, a hilly area near the lunar equator and just south of the giant crater Copernicus. Likely created from the ejecta thrown out when Mare Imbrium was created, the Fra Mauro site was thought to potentially contain material from deep inside the Moon that could shed light on our companion satellite's origin.

In the eight months since the harrowing flight of Apollo 13, engineers made several changes to the spacecraft to reduce the chance of another explosion happening. To help ensure that the crew could make it home if another emergency occurred, an additional oxygen tank and battery were added. The unplanned pause also allowed time for some software updates to be added to the lunar module computer; a particularly welcome addition was the ability of the computer to recognize changes in the height of the surface during the approach to the landing site. With this new capability, the computer would not be confused by the undulating terrain as the vehicle headed toward landing.

What is past is prologue

In the afternoon of January 31, 1971, the flight thundered away from the Kennedy Space Center on its Saturn V launch vehicle after only a brief 40 minute hold for weather. After restarting the S-IVB third stage for trans-lunar injection (TLI), the command module Kitty Hawk and her crew were on their way to the Moon.

A very serious problem cropped up almost immediately after TLI, as Kitty Hawk attempted to dock with the mission's lunar module, Antares. Fingernail-sized latches on the docking probe used to connect the command module to the lunar module failed to catch, and the two spacecraft were unable to dock. Only after repeated attempts was Kitty Hawk able to capture and securely attach Antares. Afterwards, the S-IVB was sent on its way to a lonely but spectacular death and the combined Apollo 14 spacecraft continued the voyage to Fra Mauro.

The four days in transit and the time spent in lunar orbit were uneventful—or at least as uneventful as a flight to the Moon could be. Insertion into lunar orbit occurred at about 82 hours into the flight. To conserve precious fuel in the LM, the combined spacecraft lowered its orbit to a nine mile (about 14.5 kilometers) perigee several hours later. Preparations for the descent started the next day, with the activation and checkout of the LM.

However, less than four hours before the scheduled landing, controllers noticed that according to the indications on their consoles in Mission Control, the LM's Abort pushbutton appeared to have been pressed. When asked via radio, Shepard confirmed that no one on board Antares had pressed the Abort button—which meant there was a short-circuit or other electrical issue somewhere inside the LM's complicated guts.

This was potentially a mission-ending problem: if the button was pressed and the engine was firing, the LM would immediately begin its abort procedure as soon as the lunar descent started, making a landing impossible.

Under hard time pressure, the ground had to quickly figure out what was wrong and devise a workaround. What they came up with was the most brilliant computer hack of the entire Apollo program, and possibly in the entire history of electronic computing.

To explain exactly what the hack was, how it functioned, and the issues facing the developers during its creation, we need to dig deep into how the Apollo Guidance Computer worked. Hold onto your hats, Ars readers—we're going in.

The Apollo Guidance Computer laid bare

It’s common to find that the AGC is often described as a mere calculator, or compared to a controller chip suitable for a watch or microwave. Looking at your watch, it tells the time and little else. The chip that drives the microwave blindly starts and stops the magnetron to heat expired Kung Pao chicken. In these devices, there is a very limited interaction with the surrounding hardware, no sophisticated computation, and no decision-making of any note.

In describing a "computer," one expects that the system would include the abilities we attribute to contemporary computers—the ability to run several programs at once, for example, or to present a simple yet intuitive interface, to control a wide variety of devices, and to gracefully recover from application errors. “Ha!” you might exclaim, “I carry a computer like that in my pocket!”

The idea of such capabilities being available nearly 60 years ago stretches credulity, but the Apollo Guidance Computer had these features and more. An interpreter to process “virtual” machines, similar to Java byte code? Check. The ability for remote data updates? Yup. Given all of these capabilities and more, it’s quite reasonable to argue that the AGC compares favorably with a modern smartphone. Yes, the AGC is slower and has far less memory, but that is only due to its unfortunate timing at birth, being at the wrong end of the Moore’s Law curve.

Although the processor at approximately 80,000 instructions per second was not especially fast, it is impossible to overemphasize the impact that its scarce memory had on AGC software developers. Consider the limits the programmers were under: all the software for the flight to the Moon and back had to fit in 36K words (15 bits long, plus 1 bit for parity) of read-only core rope memory. As “bytes” were not a concept in the AGC, all 15 bits of a word were accessed at once with no easy way to break the word into smaller divisions.

A number of IBM 2314 disk drives (white) and an IBM 2540 Card Reader / Punch, photographed in 1968.
Enlarge / A number of IBM 2314 disk drives (white) and an IBM 2540 Card Reader / Punch, photographed in 1968.

Secondary storage was not an option: disk units, then the size of washing machines, could not even fit in the spacecraft. Tape storage, while a reliable and viable option, was considered far too late in the development cycle to be included in any designs. The AGC's software was entirely contained within core-rope modules housed inside the AGC itself, a 70lb (about 32kg) box measuring 61cm long, 32cm wide, and 17cm tall.

In addition to the 36k words of read-only memory for the core programming, the AGC had a trivial 2k words of RAM—necessary for the operating system, process management, recovery, and global variables for all mission phases. That’s it. Shoehorned amongst this meager amount of RAM were dedicated memory areas used by application programs: the software that performed the guidance and navigation tasks, landing on the Moon, or rendezvous. Basic programs were each allowed a whopping seven words for temporary variables. And no, that’s not a misprint.

With these constraints in mind, it’s easy to be cynical when facing the task of installing the latest multi-gigabyte application on our laptops.

Listing image by Frank O'Brien / Aurich Lawson

An Apollo Guidance Computer (left), with DSKY module (right). For reference, the AGC pictured here measures 61×32×17 cm.
Enlarge / An Apollo Guidance Computer (left), with DSKY module (right). For reference, the AGC pictured here measures 61×32×17 cm.
NASA

How the AGC worked: Jobs and Tasks

At the beginning of the 1960s, as in the early days of microcomputers, it was a completely normal experience to use a computer with a minimal executive monitor running. You'd load a singular program, let it execute, and then slog through its output. This was quickly recognized as a huge waste of resources, and "multiprogramming" found its way into the mainframe systems of the era, vastly improving system utilization. Small computers, generally lacking the necessary memory, hardware support, and system programming tools, were left behind to run their singular applications one at a time.

Though definitely a "small" computer in the terms of the day, singular program execution was not a satisfactory design for the AGC. For a spacecraft computer with its wildly varying performance requirements, its need for some kind of user interface, and its unpredictable I/O traffic, singular program execution was dismissed out of hand. A fully multiprogramming operating system was necessary—one that could break work into manageable chunks and schedule those chunks only as they were needed.

Two types of work ran in the AGC, classified either as "Jobs" or as "Waitlist Tasks." Jobs were independently scheduled to run, had a tiny but dedicated memory area for their variables, and used a priority level that established their position in the AGC's run queue.

Rather than being built as large programs that sat idle and tied up significant resources, the vast majority of Jobs (but not all, as we'll see in a moment!) were small programs that ran in a short amount of time, usually with a narrowly defined function. Once the work was completed, the Job rescheduled itself for some point in the future, or terminated itself, often with the expectation that it would be rescheduled by another Job. Thus, Jobs were constantly being created and terminated, and the amount and type of work the computer was doing varied greatly with the mission phase.

Unlike contemporary systems, where setting up and taking down of a process is an intensive effort, scheduling work in the AGC was a trivial matter. On modern systems, virtual address spaces and paging areas are created, memory allocation routines reserve working storage, and an elaborate security environment is established. None of this was needed for the AGC (or indeed, even conceptualized at the time of the AGC's development). All that was needed was to request an available slot (called a "Core Set") in the process table, and to transfer control to the program. Unwinding this effort was just as simple; the Job requested that its process slot be cleared, and at that point, the Job simply ceased to exist.

The AGC's program flow was carefully controlled to ensure enough Core Sets would always be available to handle the required number of Jobs for each stage in flight. If the AGC couldn't locate a free Core Set—something that should be impossible—the AGC would produce the well-known "1202" program alarm. As luck and fate would have it, this impossible eventuality did indeed happen during the landing of Apollo 11, and quick thinking by Mission Control salvaged what might have otherwise been a dangerous aborted landing.

Fundamental for a realtime computer, Jobs had a remarkable ability to define a point in their execution where they could be restarted in the event of a system problem. Processing logic was designed so that when an important phase of a computation was reached, a restart point could be defined. In the event of a system restart (not a full reboot, which would wipe all data in RAM), data in memory was preserved, and the Job could resume at its predetermined restart point as if nothing had happened. This feature proved its worth during the troubles on Apollo 11’s landing, where the computer suffered numerous restarts but continued guiding the LM toward landing without issue.

A portion of the Real Time Computer Complex located on the first floor of the Mission Control Complex in Houston. NASA used multiple IBM System/360 mainframes to power each Apollo mission on the ground, but the Apollo spacecraft didn't have nearly enough room for a mainframe on board.
Enlarge / A portion of the Real Time Computer Complex located on the first floor of the Mission Control Complex in Houston. NASA used multiple IBM System/360 mainframes to power each Apollo mission on the ground, but the Apollo spacecraft didn't have nearly enough room for a mainframe on board.

Located on the Display and Keyboard (universally referred to as the DSKY, and spoken in a way that rhymes with "whisky") is a somewhat misleading numerical field labeled “PROG,” for "program." This label might lull the casual observer into thinking that this field shows the program currently "running" on the AGC (ignoring that fact that many other routines are running underneath). More correctly, the program number displayed is called the “Major Mode,” reflecting the phase of flight or navigation the astronaut wants to accomplish. Surprisingly, the Major Mode display is little more than a visual aid to the crew, ensuring that they understand the goal of the software executing at the time. There are only a few instances where the number in the Major Mode display is actually referenced by other software—and, as we will see, one of those instances is critical to this story.

Waitlist Tasks were a very different breed from Jobs. By definition, Waitlist Tasks are even simpler than Jobs—they were extremely short, critical tasks with no need for dedicated memory areas or elaborate process management. Reading gimbal angles and accelerometer data from the spacecraft's inertial measurement unit took only a few machine instructions, but this work drove computations that propagated throughout the guidance, navigation, and control software. As such, it was necessary to sample the IMU data at precisely known intervals, and the tasks were rigidly scheduled.

Other Waitlist Tasks were charged with shutting off thrusters fired by the Digital Autopilot, or checking whether that Abort pushbutton we previously discussed had been pressed. Waitlist Tasks were triggered by timers set in advance, and they ran with all interrupts inhibited since they were expected to complete in tiny fractions of a second.

Doing virtual machines in the 1960s

Up to eight Jobs could run in the LM's AGC (seven in the CSM's AGC), and of these eight, as many as five were what we might (with a bit of squinting, perhaps) recognize today as “virtual machines.” Although not separated through isolated address spaces, each Job executed independently and with its own dispatch priority.

Perhaps surprisingly, some of the more intensive Jobs were assigned a lower priority than others. The main routine for the lunar landing, Program 63 (or just "P63," in AGC parlance), had a lower priority than most other work in the system. This is reasonable, as the size and complexity of P63 could easily monopolize the processing resources of the computer. By assigning such intensive work to a low priority, P63 allowed small, short, but critical work to process ahead of it. In many cases these smaller programs performed essential operations, such as managing the spacecraft attitude, reading radar data, or updating the astronaut’s displays.

<a href="https://en.wikipedia.org/wiki/Charles_Stark_Draper">Charles Stark Draper</a>, founder and head of the MIT Instrumentation Laboratory (later renamed the Charles Stark Draper Laboratory), examining an Apollo guidance and control mock-up in 1963.
Charles Stark Draper, founder and head of the MIT Instrumentation Laboratory (later renamed the Charles Stark Draper Laboratory), examining an Apollo guidance and control mock-up in 1963.

The underlying AGC hardware was hobbled by its tortuous evolution, and traces its origins to a computer designed in 1960 for a proposed Mars mission. By the time MIT was contracted to build the Apollo Guidance and Navigation system in 1961, the “Mars computer” had evolved to perform eight different machine instructions, with access to 4K words of memory. This would be ludicrously inadequate even for a lunar mission, but the time needed to redesign the hardware and software from a clean sheet of paper would simply not fit in the “before the end of the decade” deadline set by President Kennedy. The hardware was used as a starting point and forcefully evolved, stretching the architecture to the limit.

But even this forced evolution wasn't enough—the hardware lacked essential capabilities. Spacecraft software demands sophisticated mathematical programming, full of matrix operations, trigonometric functions, and multiple precision variables. An extensive software library might have been able to provide these capabilities to the AGC, but repeatedly calling and returning from countless library routines would eat deeply into the limited memory and processing capability. A better solution was needed. A totally novel (for its time) programming technique was used to create an entirely new environment for executing programs, bringing with it a new software-based “architecture."

Developers created a program called the “Interpreter” to address the constraints and complaints that dogged the AGC—and this brings us to the "virtual machine" part.

First, the Interpreter introduced a new stack-based Polish notation language. A generous working area of 43 words (called a Vector Accumulator area, or "VAC") was allocated, one for each of up to five concurrent interpretive programs. Like the Core Sets, these VACs were limited and carefully managed, so the idea of running out of them should be impossible. Otherwise, a "1201" program alarm was raised. (And, as with the Core Sets and their 1202 alarms, this did indeed happen during the Apollo 11 landing.)

Operations in the Interpreter were completely different from the machine instructions used in other parts of the AGC code. One or two instructions were packed in a single word, followed by a list of addresses the instruction worked on. Programs running in the Interpreter feasted on a deliciously rich instruction set of 128 new operations, where the new environment implemented the mathematical operations needed for spaceflight.

Additionally, the Interpreter environment provided features absent from the actual underlying hardware. This included capabilities that would be familiar to high-level software programmers, like index registers and sophisticated branching and masking. With all of this new functionality, implemented in less than 2.5K words, the Interpreter presented a radically new programming environment. In addition to its high-level functions, the Interpreter mostly hid the AGC’s byzantine memory banking scheme—much to the relief of AGC developers. While perhaps stretching the concept somewhat, the environment the Interpreter had created shares many of the same characteristics of a modern virtual machine.

The language of the DSKY

When reading this article, editing photos in your photo library, or just looking for yet another cat picture, you are presented with a lush graphical interface that makes navigating your screen relatively painless. While obvious now, the elements of interacting with a computer are the result of years of trial and error experimentation and focus groups.

But how do you design an intuitive, interactive interface with a real-time computer when it has never been done before? And, just as importantly, what would the display look like? What data would the astronaut need to accomplish the mission, and in what format?

Instrumentation layout of a Block II Apollo Command Module. Can you spy the AGC DSKY?
Enlarge / Instrumentation layout of a Block II Apollo Command Module. Can you spy the AGC DSKY?

Aircraft and spacecraft cockpits are visually overwhelming, with every bit of available space covered in gauges, specialized instruments, and switches. Where among all that can such an interface get squeezed in? A traditional QWERTY keyboard was never considered for the AGC. One can imagine the tragically comical effort of an astronaut with cumbersome gloves, undergoing intense acceleration and vibration at launch, trying to hunt-and-peck a critical command.

The DSKY was designed to perform the role of the interface, offering a numeric keyboard along with keys that performed special operations; a display area of three, five digit “registers” where data could be entered or presented; and a dozen lights to alert the astronaut to troubles. Three smaller displays showed the current running program (the so-called "Major Mode" discussed earlier), and the type of data being operated on. And yes, there was a field that flashed when the computer was busy, perhaps deferring to the need to show the crew some activity in an otherwise invisible process.

A well designed interface needs to establish a “language” between user and machine—one that converts the user’s needs and wants to requests that are easily understood by the software. In an act of brilliance, the “language” of the AGC was based on how humans communicate.

Imagine you are approached on the street by someone who has a minimal English vocabulary, and that person simply says, “Eat, pizza.” While grammatically abrupt, after a few moments you see that this person wants to eat, and that they want some pizza. A few gestures later, you’ve directed them to the closest restaurant. Key to this exchange is that there is an action, a verb, and an object, a noun.

This type of exchange is so intuitive that computers have used verb-noun combinations since the beginning, Starting with machine instructions (ADD [Memory Location]), commands (EDIT LETTER.TXT) to graphical interfaces (File > Print > taxes.xls), we see that we have been using these constructs all along.

From this, it should be no surprise that two prominent keys on the DSKY are “VERB” and “NOUN".

The AGC DSKY explained.
Enlarge / The AGC DSKY explained.

Other keys on the DSKY are similarly expected. An ENTER key tells the computer to accept the data just punched in, + and – keys provide the positive or negative sign for decimal data, and the PRO key (short for "PROCEED") allows the astronaut to accept a major new event, such as an engine firing or a new phase in a program.

Because the AGC is a multiprogramming system, a routine might need to present data to the crew while another program is already interacting with the astronaut. A light on the display labeled KEY REL (for "KEYBOARD RELEASE") informs the astronaut that another program needs their attention, and requests that the user release the keyboard for the next program to use the display. Pressing the KEY REL key clears the present display, and presents the new data from the background routine to the astronaut.

A dialog with the AGC is conducted through a combination of numeric verbs and nouns. Verbs could be used to enter or request a display of data. But what data? The noun would specify that part of the question. For example, the sequence to a request the computer display the time of an upcoming engine ignition on the DSKY is:

VERB 06 ENTER, NOUN 33 ENTER

VERB 06 is the request to display some decimal data, and NOUN 33 refines the request to display "Time of Ignition." The computer responds on its three display areas (the "registers") with the hour, minute, and second the engine is scheduled to start firing. Some verbs operated by themselves, such as VERB 57, which tells the computer that the LM’s landing radar data is good, and can be integrated into the landing guidance equations.

For those expecting a far more detailed tutorial of operating the computer, I understand the disappointment. However, that’s just about it. The beauty of the AGC and DSKY is that virtually all of the interactions were through humble verbs and nouns. However, keeping up with the nearly 100 verbs and an equal number of nouns is a bit of a challenge, so much so that “cheat sheets” of the codes were mounted by the DSKY panels in the CSM and LM.

Preparing for Fra Mauro: The Abort bit problem

The Apollo lunar module was a miracle of mechanical engineering. Both of the spacecraft's two impossibly lightweight sections, the ascent and descent stages, were optimized for their particular mission phase. Landing on the Moon required large fuel tanks, landing legs, and most importantly, a large throttleable engine. Though vital for landing, all of that stuff was dead weight for the return trip—so it was abandoned on the surface and only the ascent stage rose into orbit. Essentially nothing more than a cabin, a small engine, and enough life support for about a day, the ascent stage brought the crew and their science back to the waiting command module in orbit.

The abort procedures for the LM exploited the fact that there were two separate stages. If a problem developed early enough in the descent, the entire vehicle pitched forward and returned to orbit using the still-firing descent engine. Late in the descent, where the descent stage was nearly out of fuel, the descent stage was cut loose and the ascent engine fired to bring the crew to safety.

About 45 minutes after undocking for lunar descent, the crew of Antares was deep into their checklists aligning the guidance system and ensuring the LM was ready for a landing on the Moon. Back on Earth, teams in Houston had a far deeper insight into the LM and its systems. Telemetry, streaming through the cislunar void at 50Kbits/second, contained a wealth of information on the spacecraft, most of it not visible to the crew.

It was at this point that GUIDO, the controller position in Houston charged with monitoring the computer systems, noticed on his console that a status bit was set indicating the Abort button had been pressed. This was obviously something that shouldn’t be happening—not unless the crew had pressed the Abort button, something Shepard and Mitchell emphatically confirmed they had not done.

A request was then radioed up to the crew to have them display the I/O channel that contained the Abort bit. The crew entered:

VERB 11 ENTER, NOUN 10 ENTER, 30 ENTER

VERB 11 was a request to monitor octal data in the first of the DSKY's three display registers. NOUN 10 was the source of the data to be displayed—in this case, an I/O channel. The final number, 30, indicated specifically which of the 17 I/O channels the crew wanted to see.

I/O Channel 30 contained a number of status bits, including the condition of the inertial platform, the engine throttle, and several failure indicators. But the only bit that mattered here was in the right-most digit of the display: the "Abort with descent stage" bit. And it was set to on.

The implications of the Abort bit being set were ominous. At this point, the spurious signal was benign, as the bit would not be checked until the LM’s engine was started, thus beginning the descent to the landing site. However, as soon as ignition occurred the descent program (Program 63, or just P63) would set a flagword bit called LETABBIT ("Let Abort bit"), which allowed the state of the Abort bit to be polled by other programs. If the Abort bit were still set at that point, Program 70 (P70, "Abort using the descent engine") would automatically start, potentially ending the mission.

The crew could simply try to toggle the bit off at the appropriate moment, but with so many events happening in a fraction of a second, it was not a practical solution to let the ignition of the descent engine proceed, with the hope of resetting LETABBIT as quick as clumsy gloved hands could punch in the commands.

If properly prepared, the crew could quickly interrupt the abort and another landing might be attempted later, but the complications of that effort would put a successful landing in jeopardy.

Almost three hours remained before the landing attempt, giving engineers on the ground time to develop a more robust solution.

Inside an Apollo lunar module mock-up. The Abort and Abort Stage pushbuttons are outlined in caution tape on the left (CDR) side of the central instrument panel.
Enlarge / Inside an Apollo lunar module mock-up. The Abort and Abort Stage pushbuttons are outlined in caution tape on the left (CDR) side of the central instrument panel.

Percussive maintenance buys some time

For those with a background in analog systems, one quick solution always comes to mind when problems arise. A stuck meter or a loose wire is often fixed by tapping on the device or wiggling the connection until it comes back to life. Houston sent a request to do just that: tap the panel around the switch to see if it had any effect. The crew, superbly trained in all aspects of the mission, demonstrated their superior tapping skills and the Abort bit cleared itself.

With a long list of landing preparation steps ahead of them, Shepard and Mitchell returned to setting up the LM systems. Nothing more was said about the bit as they passed behind the Moon and out of contact with controllers on the ground.

But less than an hour later, the LM swung back from the far side of the Moon and controllers relayed the bad news. The Abort bit was set again. Tapping on the panel once again cleared it, but the recurrence of the problem did not bode well for the mission.

The failure was unpredictable, and true to Murphy, would likely recur at the worst possible time. Given the symptoms of the problem, it became apparent that some sort of contamination was present inside the Abort pushbutton—likely an errant ball of solder floating in the casing and occasionally making contact with the mechanism. No procedures existed to physically disable the button, and it quickly became apparent that the permanent solution would involve bypassing how the AGC processed the button's signal.

Back at MIT, a team led by descent software author Don Eyles came up with a procedure to reset LETABBIT, which would at least tell the computer to continue ignoring the stuck Abort bit. The following keypress instructions were relayed to the crew:

VERB 25 ENTER, NOUN 7 ENTER, 105 ENTER, 400 ENTER, 0 ENTER

VERB 25 loaded data into all three display registers, with NOUN 7 informing the computer that the crew wished to manipulate individual bits in memory. The three operands of NOUN 7 were 105, the address of flagword 9 containing LETABBIT; 400 (100 000 000 binary) which selected bit 9 in the word; and then 0, to turn the bit off.

Although it was fully understood that these procedures were futile if the Abort bit was set at ignition, it was hoped that the problem would not occur until the update was entered. This was a bit of wishful thinking. If a few taps on the instrument panel could flip the bit, the acceleration and vibration of the engine firing could easily do the same.

A further complication was also obvious: if the spacecraft ran into troubles that actually required a genuine abort, the AGC and the primary guidance system could not be used. An abort would require the Abort Guidance System (AGS), a fully capable but far less sophisticated computer. (The AGS lacked even the AGC's system of nouns and verbs—the only way of interacting with the AGS was to read and write values directly to and from memory.)

A hacker's finest hour

Once again the LM’s orbit carried it behind the Moon and out of communications, leaving the crew with just a smattering of procedures and few options. The normal work of finishing the system configurations continued, and the crew maneuvered to the descent attitude, tidied up the cabin, and put on their helmets and gloves. In the meantime, Don Eyles’ team was feverishly working to find a better solution to the Abort bit issue.

Working the problem involved unraveling a complex, daisy-chained series of events. The main landing program, P63, does not perform all of the landing computations itself. Rather, it orchestrates a large number of Jobs and Waitlist Tasks, each performing a necessary part of the effort. Another Job running concurrently was the SERVICER, which sampled attitudes and accelerations that fed into the guidance equations. SERVICER, in turn, scheduled Routine R11 as a Waitlist Task, running every 0.25 seconds. R11 first checked whether aborts are enabled (via the LETABBIT flag), and if so, it then checked the status of the Abort bit. With aborts allowed, and the abort signal set (presumably because the crew pressed the Abort pushbutton), P63 is terminated, the AGC's Major Mode switches to P70, and the abort process begins.

A simplified block diagram showing the LM AGC's guidance path.
Enlarge / A simplified block diagram showing the LM AGC's guidance path.
NASA

In most hacks that defile our computers, tablets, and cellphones, a traditional vector is through new code that is introduced and executed. Directly modifying existing code, or introducing new software to influence the logic of a system, are the standard and most straightforward forms of alteration. But the AGC had all of its programming stored in core rope—an irreversible form of ROM where the programming is quite literally manufactured into the hardware. Code changes were impossible.

The only way to implement any kind of workaround was to “spoof” the computer to use different-but-already-existing logic, by manipulating only status bits and variables. This was the magic behind hacking Apollo 14’s computer.

During the final pass behind the Moon before landing, Eyles and his team came up with a far better approach than their earlier quick LETABBIT flip. This second solution could be entered in a far more leisurely manner, giving the crew essential time to verify they had patched the system correctly. (There wasn't any real input validation on the DSKY, and making a typo while mucking about in the flagwords and variables could have potentially dire consequences.)

The key to Eyles' exploit was within the little R11 routine. Immediately after checking whether aborts were allowed and whether the button was pressed, R11 performed a quick check to see if an abort was already running. An already-executing abort program must be left undisturbed, and if it found an abort already in progress, R11 would terminate without any additional processing. If no abort was in progress, P70 was immediately set up to run.

This was the breakthrough. If R11 could be spoofed into believing that an abort was already in progress, then it didn’t matter if the Abort button was pressed or not—the button's state would be ignored.

But how did R11 actually inform itself about whether or not an abort was executing? The answer was in plain sight on the DSKY: The Major Mode display, under the label “PROG”.

An AGC DSKY (this one was used in the F-8 <a href="https://www.nasa.gov/centers/dryden/history/pastprojects/F8/index.html">Digital Fly-By-Wire project</a>). Note "PROG" light at upper-right.
Enlarge / An AGC DSKY (this one was used in the F-8 Digital Fly-By-Wire project). Note "PROG" light at upper-right.

Remember that the PROG field (and the Major Mode number it displayed) was mostly a notational device for the crew to visually reference. While every major program identified itself by setting its program number in the display—so that the crew could see what the current Major Mode was—the computer and its routines rarely needed to check the display. The variable used to store the Major Mode, MODREG, was already referenced by R11, and changing its value was trivial. The spoofing of R11, therefore, could be accomplished by changing the value of MODREG, which would change the value of the Major Mode display. Most importantly, changing the value in MODREG after the landing program was started would leave the all-important landing program, P63, (mostly) unaffected.

However, while that solved the basic problem of bypassing the Abort pushbutton, the fix introduced its own problems. Normally, when the descent engine ignited for the landing burn, its thrust was set at its lowest setting of 10% for 26 seconds. During that time, actuators could align the engine so that the thrust pointed through the LM’s center of gravity. After the engine was properly centered, the computer would command the engine to throttle up to full thrust. If it appeared that the Abort program was running, the engine would not perform its automatic throttle-up, requiring that the crew manually set the throttle to 100%.

Cascading down from this action was the fact that another critical flag, called ZOOMBIT, would not be set. ZOOMBIT was set once the software had commanded the engine to throttle up, and it triggered the guidance equations to start calculating the descent to the Moon. Without the software controlling the throttle, this bit was left unset, and the computer wouldn't start its decent calculations.

Eyles' and the MIT team worked out a remediation for each of these follow-on issues—which we'll discuss in just a moment—and the details of the hack were tested in the simulator. Finally, when Antares came around the Moon for the last time before landing, Houston radioed up the complete procedure.

The Earth rises over the limb of the Moon, as seen from <em>Antares</em> during descent. The large foreground crater is <a href="https://en.wikipedia.org/wiki/Meitner_(lunar_crater)">Meitner</a>.
Enlarge / The Earth rises over the limb of the Moon, as seen from Antares during descent. The large foreground crater is Meitner.

Fooling the smartest computer in space

Four minutes before the LM began its descent, the computer displayed a countdown to ignition on the DSKY. This confirmed that P63 was operational, and was the cue to begin the hack. First, the Major Mode had to be spoofed to indicate abort program P71 ("Abort using the ascent engine") was running. P71 was selected over P70 because if the patch didn’t work and an abort did trigger, the real abort would be done via P70. This would be apparent by a change in the Major Mode display from the spoofed P71 to the very real P70.

To get started, the crew entered:

VERB 21 ENTER, NOUN 1 ENTER, 1010 ENTER, 107 ENTER

VERB 21 was the command to load data into the registers, with NOUN 1 specifying the address of the MODREG variable (1010 octal) and the new value it would be reloaded with (107 octal, which is 71 decimal). This caused the PROG readout to switch from P63 to P71, and the hack was underway.

With the Major Mode changed and the DSKY indicating Program 71 running, an inadvertent abort was blocked out—R11 would check MODREG, see a P71 abort underway, and terminate. Program 63’s displays and processing continued normally despite the change in the Major Mode display. Ignition was on time, and the throttle set to 10% thrust. Then, 26 seconds later, Shepard manually advanced the throttle to 100% and Mitchell started the process of entering the remaining parts of the patch.

With P71 appearing as the Major Mode, the software was not aware that the engine had been manually throttled up, and ZOOMBIT had not been set. Landing guidance wouldn't work without ZOOMBIT, so the next step in the hack addressed this issue:

VERB 25 ENTER, NOUN 7 ENTER, 101 ENTER, 200 ENTER, 1 ENTER

VERB 25 loaded data into all three registers. NOUN 7, as shown earlier, allowed manipulating individual bits. Here, ZOOMBIT is in flagword 5 at memory location 101 octal. "200" is bit 8 within the flagword (010 000 000 binary), and the bit is to be set to 1.

With ZOOMBIT set, landing guidance was engaged, and the throttle setting was now controlled by the computer. Mitchell then keyed in the next sequence:

VERB 25 ENTER, NOUN 7 ENTER, 105 ENTER, 400 ENTER, 0 ENTER

Similar to setting ZOOMBIT, the bit in flagword 9 containing LETABBIT (at memory location 105 octal) is now set to zero. With this bit at zero, aborts cannot occur by pressing the Abort pushbuttons. The crew was now safe from an unintended abort due to a faulty button throughout the descent, since nothing else in the descent programs would toggle LETABBIT back on.

To clean things up, a final sequence was entered:

VERB 21 ENTER, NOUN 1 ENTER, 1010 ENTER, 77 ENTER

Reversing the first step in the hack, MODREG was set back to 77 octal (63 decimal). Setting the Major Mode (and the display on the DSKY) back to P63 was necessary for the rest of the descent programs (P64 and P66) to operate correctly. Now that the guidance software was controlling the engine throttle, Shepard brought his manual throttle back to its minimum position.

This is hard enough to do without having to worry about your spaceship turning itself ass-over-teakettle.
This is hard enough to do without having to worry about your spaceship turning itself ass-over-teakettle.
NASA

Requiring that the manual throttle be brought back to minimum was a necessary last step. While the crew could manually adjust the descent engine, normally the manual throttle control was configured to define the minimum amount of thrust the crew wanted, no matter what the computer was commanding. Keeping the throttle in minimum meant that the astronaut was requesting at least 10% of full thrust. Placing the throttle at its maximum setting, though, would keep the engine firing at 100% thrust regardless of what the guidance equations were requesting. If full thrust was commanded for more than 40 seconds after the engine was scheduled to throttle down, the guidance equations would recognize that the LM was slowing down more than desired. In response, the computer would command the spacecraft to flip around 180 degrees, thrusting forward. This is a less-than-optimal result.

In less than two minutes after the descent to the Moon had started, the Abort pushbutton had been successfully disabled and the computer was happily managing the descent. All indications were that the next lunar landing would be successfully accomplished in eight more minutes.

Surely nothing else could go wrong now

With the Abort button troubles behind them, Shepard and Mitchell appeared to be headed towards a good landing. All was going very smoothly four minutes into the descent, as the LM passed through 36,500 feet (about 11 kilometers) above the lunar surface.

Based on the experience of previous missions, the landing radar should be locking onto the surface of the Moon at or around this point in the landing. When the radar successfully received signals reflected off the surface and could process the data, it sent signals to the AGC indicating that the spacecraft's altitude and velocity measurements were good. The AGC would then extinguish the ALT[itude] and VEL[ocity] lights on the DSKY, allowing the crew to incorporate the radar data into the landing guidance.

But unknown to the crew and controllers, the radar had switched itself to its short-range mode, intended only for the final few thousand feet of the descent. At seven miles up and over ten miles from the landing site (or about 11 kilometers up and 16 kilometers away), the suddenly nearsighted radar was unable to lock onto anything.

The computer and its inertial guidance platform were marvels of precision, able to navigate 250,000 miles (about 400,000 kilometers) to the Moon with an error of only a few thousand feet. This was pretty amazing by any standard, but for the final phases of landing on the Moon, a thousand feet (about 300 meters) is the difference between a smooth landing, an abort, or a crash onto the surface. Radar was able to provide position and velocity measurements accurate to within 4 feet (a bit over 1 meter) in altitude, and a fraction of a foot per second in velocity. Because the radar was absolutely critical for landing, a failed radar generally demanded a mandatory abort, no matter what else was going right.

As Antares passed through 32,000 feet (about 9,700 meters), Mitchell became concerned and informed controllers that the radar hadn’t locked on. Houston replied with a suggestion to pull the circuit breaker for the radar, and then power the system back on, which did the trick. Solid radar data began flowing into the computer, and the crew quickly agreed to accept it. Just a few minutes later, Shepard made a smooth and on-target touchdown at the Fra Mauro highlands.

<em>Antares</em>, safely on the lunar surface.
Enlarge / Antares, safely on the lunar surface.
NASA / Frank O'Brien

You've gotta know when to hold 'em...

After the mission, when asked if he would have attempted to land without the radar, the notoriously hard-charging Shepard reportedly replied, “You’ll never know.” In Gene Kranz’s Failure is Not an Option autobio, Kranz recounts that Flight Director Jerry Griffin was convinced that Shepard would indeed make an attempt to land without radar, and would just as certainly have had to abort when fuel ran out.

Clearly, a landing attempt without a radar would have a high likelihood of failure. Without an atmosphere, objects both near and far on the Moon are equally razor-sharp, which destroys any sense of depth and speed. Using Mark 1 Eyeballs to accurately judge altitude and velocity during a lunar landing is not realistic given the sharply limited amount of fuel the LM carried.

But what if the radar failed much closer to the Moon, after working satisfactorily up until that point? A review of the Apollo 14 mission rules provides the answer.

The Apollo 14 mission rules: if loss of landing radar occurs during powered descent and after the radar has provided an "adequate" reading of the LM's altitude, the landing can continue—provided GUIDO concurs and there are no other significant issues.
Enlarge / The Apollo 14 mission rules: if loss of landing radar occurs during powered descent and after the radar has provided an "adequate" reading of the LM's altitude, the landing can continue—provided GUIDO concurs and there are no other significant issues.

Mission Rules are the bible of spaceflight. Agreed on far before the flight, they are designed to eliminate any debate or casual or emotional decision-making during the compressed realtime of a mission. Under the rules for the landing radar, it was allowable to attempt to land with a failing radar late in the descent, if the radar and guidance system had less than a 1,000 foot disagreement. Additionally, since a failing radar would likely be sending spurious data to the AGC, continuing the landing attempt required using the Abort Guidance System for altitude information.

At about 12,000 feet (about 3,700 meters) above the surface, the crew synchronized the AGS with guidance data from the AGC, and the two computers tracked each other closely for the remainder of the descent. The AGS, without the distraction of problematic radar data, would be accurate enough to get the crew safely to the surface.

Permanent fixes

The idea that a single errant switch could derail a lunar landing attempt was unacceptable. After the mission, a new variable in the AGC code was introduced that allowed the crew to "mask out" (that is, to ignore) the Abort and Abort Stage pushbuttons. The scenario assumed that a failing switch would be recognized well before the descent began, and commands could be entered in time to prevent an inadvertent abort. Like the fix used for Apollo 14, this would make initiating an abort through a pushbutton impossible, and any urgent situation would have to be performed on the Abort Guidance System.

The nearsighted landing radar fix was even more straightforward. The radar is placed in one of two positions during descent, depending on whether the LM is in P63 or pitched nearly vertical in the approach phase of P64. On Apollo 14, the radar likely encountered some noise in its signal, perhaps from an overly strong return from the surface or a reflection of a side lobe from the spacecraft. This caused the radar to switch away from the desired long-range setting to its short-range mode. The range selection circuits we modified so that the radar could not switch between modes unless it was correctly positioned. Easy peasy.

The recovery from Apollo 14’s Abort switch failure can only be described as brilliant and heroic. But the most important enabler of this effort was that the software, while fiendishly complex, could be understood by a small team of developers. Modern hardware and software, with its extensive protection schemes, virtualization and dynamic program management simply would make such a simple hack impossible. Faced with a comparable problem today, even if the fix were trivial, the solution likely would require large amounts of code to be recompiled, tested and uploaded to the spacecraft. This may not be possible given the short timeframe necessary to save the mission.

In the end, Apollo 14’s fix truly represented the “Spirit of Apollo," where talented teams made the impossible happen.

Frank O'Brien is the author of The Apollo Guidance Computer: Architecture and Operation, the canonical resource on the AGC and its software. For 25 years, Frank has been a contributing editor to the Apollo Lunar Surface and Flight Journals.  He is a NASA Solar System Ambassador, and lectures several times a month on a variety of spaceflight topics. Most notably, Frank lives just down the road from where the Martians landed in 1938. No, you can’t make this up.

Let's block ads! (Why?)



"computer" - Google News
January 31, 2020 at 12:30AM
https://ift.tt/2RFCCdd

A deep dive into the Apollo Guidance Computer, and the hack that saved Apollo 14 - Ars Technica
"computer" - Google News
https://ift.tt/2PlK2zT
Shoes Man Tutorial
Pos News Update
Meme Update
Korean Entertainment News
Japan News Update

No comments:

Post a Comment