Monday, February 26, 2007

The Real Reason Why Building Software is not like Building Bridges

I have seen many articles expounding on how software is not like bridge building. They attempt to reason about the problem via poorly formed analogies and often miss the actual critical differences between software development and bridge development. You will see things like the number of bolts and rivets compared to the number of lines of code. You will hear about the size of the software problem space vs. the size of the bridge builder's problem space. But none of these articles has ever hit the nail on the head for me, so I thought I would draw an apt analogy that can help both software programmers and bridge engineers understand each other.

Imagine you are a bridge engineer and have been teleported to an alternate universe to build the Golden Gate Bridge. You have things that look like beams, bolts, and welding torches. You set out to build the bridge and one day your crew finally finishes it. The first traveler gets on the bridge and drives across it, but he winds up in Des Moines, Iowa.

"What happened?" you exclaim! Your crew laughs at the traveler's predicament. "It must be a bork," they say. One of the crew goes out half way on the bridge, turns a bolt 90 degrees clockwise, and shouts back to send another driver across. Another brave traveler, dubbed "beta-tester" for some reason, sets off across the bridge and finds herself in China. The crew laughs heartily and shouts back, "Give it another spin!" With another 90 degree turn, and another brave traveler later, someone actually drives across the bridge into San Francisco as intended! There is much cheering and the sound of the popping of the Champaign floats across the ocean.

Just then, you notice a Volvo trying to drive across. You stare in awe at this strange universe wondering if the Volvo ever managed to get across. You tap one of your crew in the back and ask, "Do you think the Volvo will get to the right place?" He drops both his Champaign glass and his jaw. "VOLVO!!!!" he screams! "No, No, NO!" a crew member exclaims. "Didn't you know your world had Volvo's?" you ask. "Of course," he replies, "but we didn't think anyone would be foolish enough to try to drive it across the bridge!"

Flabbergasted, you wait for the inevitable. Half way across, the poor Volvo starts to float off the ground. It starts to spin over and over again. Horrified, you ask, "Can we make it so we can allow Volvos to drive across after we rescue that poor soul?" "Certainly," comes the reply of the crew chief, "but we'll have to raise the entire bridge by another 4 feet. We could either lift it from the bottom or just put an extra 4 feet of concrete on the surface of the bridge." You choose the latter option, saddened by the fact that your beautiful bridge now has a thick white coat of concrete to accommodate the simple Volvo.

Thinking your task has finished, you look forward to running far, far away from this universe and back into ours. You then notice 2 cars starting to drive across the bridge in tandem. A familiar feeling of dread fills you as you stammer out, "What happens if the bridge is used in parallel?" "RUN!" comes the screaming reply. No longer questioning the logic of this world, you take off. Looking back, you see a single bolt fall from the top of the bridge. As it falls, it begins to absorb the bridge itself. More and more of the bridge falls INTO the bolt. The cars, bolt, and bridge are lost forever. That is until the crew chief comes over to where the bridge was and pushes a button at the edge of the road. The bridge instantly reappears. He also posts a sign, "No driving on the bridge in parallel."

Before finally finishing with this world, you turn to ask your crew how this universe works. "It's quite simple," comes the reply. "Any bolt, beam, or other bridge part can be a bridge to any other part of the universe. It is even possible for a part to become a black hole leading to nowhere, as you saw." "Why do you build bridges then?" you ask. "We build bridges for the same reason you build bridges, because they are useful. We cannot help the fact that our universe is very strange. We envy your bridges that always lead to the same place and your parts that do not become black holes. From what you told us of your world, your software builders are in the same situation as us. Give them our sympathy and our camaraderie." With that, you return home with a far greater appreciation of what your programmer friends have to go through.

Thus ends our journey to the alternate dimension. Unfortunately for software developers, this place actually exists for us. The Halting Problem shows that we can never be certain of any result about any computation. It may run properly, it may become an endless loop, or it may give the completely wrong result. Adding concurrency adds on even more uncertainty into the situation. And we can PROVE this logically. There is no escape from the Halting Problem weirdness. Every single line of code added may blow up the bridge, dump you in China, or create a black hole.

THAT is the real bridge to programmer metaphor. And it is why software development will always be hard.

Now to help you all remember allow me to coin Gleason's Law:

"Any sufficiently complex software is indistinguishable from magic."


great ideas BUT like many of blogger's blogs it only prints the first page - just an FYI

By Blogger tCodex, at 10:44 AM  

Hey thanks.

I'll get that fixed.

By Blogger EntropyFails, at 11:20 AM  

OK, that was just too funny, but oh so real.

By Blogger Vita Rara, at 2:08 PM  

I usually hear it being compared to building a house, to which I reply, "sure, it's like that... as long as every customer has their own unique requirements, like 'I want the house to float 5 feet above the ground on magnetic fields so that it's immune to earthquakes'. And then they expect an accurate time estimate and a fixed price to build it."

By Blogger Dean, at 2:15 PM  

Do you really expect that people will read you tiny font?

Please respect my browser setting and use 1em for body text.

By Blogger Nir Soffer, at 3:51 PM  

You're *way* overgeneralizing the consequences of the halting problem.

I. It's quite possible to prove that any given program will or will not halt in all situations. The undecidability theorem merely shows that you can't automate the process of generating such a proof and have your algorithm give the right answer for all possible programs.

II. Regardless, the halting problem says absolutely nothing about showing the *correctness* of programs.

III. PCs aren't Turing machines. They have limited storage and are thus finite state machines. You *can* write a Turing-machine program that will prove or disprove the terminability and correctness of any finite state machine. But since the worst-case running time of such a program is proportional to the number of states, (i.e., exponential with respect to how much RAM you have), it's totally useless in the real world.

Bottom line: the halting problem is not even remotely applicable to software engineering in the real world.

By Blogger dfranke, at 4:13 PM  

Hey Dfranke!

You said, "It's quite possible to prove that any given program will or will not halt in all situations. "

You can only do this if you have proven the operating system as well. No one has done this yet. As I said in the reddit comments, you can prove things about trivial problems, but complex problems always have edge cases.

As for correctness, I think the halting problem applies because no one is really interested in computations that only return _|_.

PC's are properly LBA's. But acceptance testing in an LBA is PSPACE-complete. As you say, the problem is too big for us to handle. So for most practical problems, we CAN consider the program as a Turing machine. And the halting problem DOES apply here.

And since we can not be certain of outputs and any bug can generate any arbitrary output, I think that completes the analogy.

By Blogger EntropyFails, at 8:36 PM  

Excellent article but I have to agree with Dfranke that I think you are somewhat exaggerating the importance of the halting problem. You're correct that complex problems always have edge cases, but the majority of software bugs I've encountered (or created) in my own experience have to do with performing the wrong computation rather than entering an infinite loop or infinite recursion. One thing which I think proves my point is that REALbasic, which is ostensibly sold to some of the lowest self-esteem, greenest programmers in the world, did not properly handle stack overflows until late in version 5.5. Granted, improper recursion is more common than improper looping.

What I'm getting at is that the problem of making a function halt is much simpler to a human than making a function do what it should. A good example is the recent FizzBuzz mania: lots of programmers contributed incorrect solutions in the comments, all of which terminate.

By Anonymous Anonymous, at 9:44 AM  

Pardon me for coming in and arguing algorithms theory here but...

dfranke, you seem to completely misunderstand the halting problem and how
it works. I could give you 6 lines of C that you wouldn't be able to
prove if it would halt or not. No OS bugs, no initialization
tricks, nothing. There are, of course, certain cases for which you can
say "this halts" or "this does not halt", or even "under some
circumstances X, this does not halt". That's not the same as claiming
that you could do this for all programs.

EntropyFails, I agree and disagree with you. Yes, OS validation would be
a vital part of a solution, as well as validation of the hardware. But
that still wouldn't allow us a simple solution of the halting problem.
You say it's PSPACE-complete, and that's correct. The halting problem
isn't impossible, it's just NP-hard.

I agree with fusiongyro's point: lack of halting isn't the reason
that software dies. It's out there (I teach a freshmen CS course, so I
see plenty of it), but incorrect algorithms are more common. Bugs I see
fixed in the software I use are more likely to be algorithmic issues or
security issues than "it doesn't stop".

Interesting article. Thanks!

By Blogger Bill Weiss, at 9:36 PM  

Thanks for commenting, Bill Weiss!

Getting to the truth is what they are there for. *grin*

As for the halting problem not being the cause of all bugs, I agree. But I was making a strong statement and I wanted the strongest claim to protect it. Programming will ALWAYS be a weird alternate world. So my comment directed people to the fundamental problem.

Adding all the other problems we have at the periphery, it is amazing that we get any software done at all!

By Blogger EntropyFails, at 10:21 PM  

Bill, the halting problem is NP-hard, not NP-complete. It is as hard as any NP problem but isn't NP itself. It is undecideable. No amount of time can solve it.

Entropy, thanks for the new perspective on this question. People often talk about complexity being the issue but they don't tie it back to the inherent complexity in programming. I have some other thoughts on your post here.

By Blogger Thornkin, at 11:15 PM  

You don't seem to understand the halting problem. The halting problem merely states that it is not possible to construct an algorithm that takes as input an abritrary algorithm and can tell whether that input algorithm will terminate, for all input algorithms.

It does not say that you cannot tell whether a particular programme will terminate, either for a specified set of inputs or even for any input: of course you can. It doesn't even say that you can't construct an algorithm which can determine whether some input algorithms will terminate; only that it cannot do it for all possible algorithms.

Your entire analogy is false; computer programmes are just as predictable as bridges, if people are willing to spend on them the time and effort that goes into a major engineering project. Tasks can be broken down; correctness proofs generated to make sure that every bit of the progamme does its job correctly. It just takes more time, effort, and therefore expense, than most people are willing to expend for most applications.

But just because you can't be bothered to spend the effort, don't try to claim that it wouldn't work anyway because software is somehow magical and different to building anything else. It isn't. A large software programme is complicated, yes, but so is a jet fighter, an assembly line, or, yes, a bridge (or do you think that bridge-building is easy, just a matter of piling stuff up, and civil engineers spend all those years at university and as apprentices just dossing about)?

Software is predictable. It doesn't just randomly do stuff like turn into a black hole, unless it's thrown together shoddily using shuddy components built on top of shoddy operating systems, because nobody involved can be bothered to take their jobs as engineers seriously and prefer to pretend they're 'artists' or 'magicians'.

Of course most software is written like that, so it seems unpredictable. But it needn't be. And articles which try to justify the current lamentable state of affairs, to make out that it's somehow okay to approach software enginnering with the same sort of cowboy attitude as rogue builders who put up conservatories that do fail in unpredictable ways, instead of as professional engineers who build bridges that they know will stay up and which do stay up, are despicable.

Because they say 'it's not worth trying to do better, because we couldn't anyway, because software is somehow magical'.

We could do better.

We should do better.

And learn something about the basic results in algorithm theory before you start spouting off again. Don't just wave the words 'halting problem' around like you're Harry Potter trying to cast a spell.

By Blogger SK, at 1:54 PM  

Hey SK!

This post wasn't a course in Algorithmic Information Theory, so I did play it a bit fast and loose. But then again, the software to bridge analogy is fast and loose itself, so I don't feel as bad about it as you apparently think I should. ;)

The fact is that we cannot perform proof on anything but the most basic and simple of systems. The reason we cannot do this is because of the nature of the computational space itself. The halting problem means that the output space grows so fast than any proof of even modestly complex code MUST be larger than the code itself.

If we restricted ourselves to completely proven code, we wouldn't ever get anything done in the software development world. So while we do use heuristic approaches in order to limit that complexity, the basic halting problem sits underneath those barriers waiting to gobble up any computation that falls through the cracks.

I agree that right now, our "software cracks" are far too wide. We could do much better on proving things about our software. I have a strong interest in Qi exactly for that reason. It allows me to prove things about types and thus allows me to make stronger guarantees about my code.

But even after all of that, bottom will rear its ugly head whenever it gets a chance. The nature of the computational beast is simply different than the nature of the physics of bridge building. In programming, you are always one mistake away from any arbitrary computation. We humans are fallible creatures and so is the software we create.

By Blogger EntropyFails, at 3:00 AM  

Really great post!

By Blogger Connelly, at 11:02 AM  

Thank you SK for your post.
Yes, big software can be proven right (for instance, the software controlling the fully automated subway line 14 in Paris had been - see for instance in french).

Do not fool your readers : the main difference between bridge building and software development lies not in computation theory (which, BTW, is obvious next to the physics involved in construction) but in economics ; you can not sell a bridge that fails once in a while, but there is a huge market for unreliable software. The risks and the demand are not the same.

By Blogger rixed, at 1:33 PM  

Heya Rixed!

The proof for most code only goes so deep. There is no modern operating system that comes with a proof of correctness. Embedded systems like the Metro's are targeted for proven code much more often. But since the vast majority of software development takes place in the far more complex world of modern OS development, proof is a very hard thing to come by.

Economically speaking, even audited code looses out to ad hoc code as it is so much slower to do the auditing process. Fully proven code will be even slower to develop.

I'm definitely not arguing for an "anything goes" approach to programming. I think we should use better engineering practices such as separating the functional space from the monadic, time bound space like Haskell does. I'm definitely for proofs wherever you can find them.

But any real software development in this world will likely straddle the gaping void between doing what you want and doing any random computation. The halting problem shows that proof will only ever go so far.

By Blogger EntropyFails, at 9:47 AM  

OMG isn't it obvious programmers don't necessary make good mathematician most of them forget that simple axioms tend to confuse humans with paradoxes which not properly grasped in account with programming gives well unintended results. Read some of D.Hilbert, L.E.J Brouwer and some of Riemmans work. A computer on the ground level is still a calculation device, Above the level and up one need intuitive skills blending new things with old so maybe it is building a bridge "or" not thats just it or like A. Einstein said "Just keep it simple".

By Blogger Wulfcry, at 12:32 PM  

I had some problem with my computer, but the most bigger problem was that i missed the drivers, so i needed to looking information by internet. Finally i reached an information that advised me download by the web a software driver. this information will help me a lot, very useful. actually i found a site interesting called costa rica investment opportunities i think it is useful too.

By Blogger javieth, at 1:41 PM  

Great blog,much needed,thanks for sharing.
IT Services

By Blogger Vinise N, at 3:25 AM  

Post a Comment