Monday, June 30, 2008

Études en Ocaml: Keeping your functions clean, Pt 1

"But you could save a PUSH instruction if you inlined it..."

One of my coworkers, who is still somewhat fresh out of college, asked me why I had broken up one of my functions into a few smaller ones, when there was only one caller for each of them. One of the funnier things he suggested was that inlining the code would save an x86 push instruction. But then he surprised me, by suggesting that unnecessarily breaking up functions can make code more difficult to read.


Why is it so hard to keep functions short?

It's hard to keep functions short. If you do not think so, then you are probably lucky, or smart, or perhaps it isn't too difficult for you after years of hacking code, but that doesn't make it any less inherently difficult. It is still useful to understand why other people find it difficult to keep functions short. Indeed, why is it difficult at all?

Before we continue, I should point out that it's better to say that functions should be "atomic", or "clean". A good function is one that can't logically be broken down anymore without making it pointless. This small but important semantic difference has significance later on.

What do you see here?

I'm going to make a grand, sweeping assertion that many of you will likely say "duh" to. If you disagree, then it's probably because it doesn't apply to you, and that's great but we're trying to figure out why keeping functions clean and short is difficult for most programmers-- including myself --, so please bear with me.

Programmers are by and large detail-oriented. At the least, we're good at details, because we have to be. Computers are finicky contraptions that break unless everything is exactly correct. There is no "almost works" in much of what we do. Because of their digital and binary nature, computers and the tools we use to program them are not fault tolerant. It is left to the programmer to make sure everything is right, and because of this, anyone who does not enjoy picking apart the details of a problem probably won't enjoy programming.

So how did I come to this conclusion? It's quite simple. Back in school, we had two pretty darn tough "weed-out" courses. One course was taught in a high level language, and the other course was taught in C and assembly. Sounds like *your* university doesn't it? Now let me ask you real quick, did the majority of your CS friends find the C/Assembly class easier, or did they totally pwn that Scheme/Lisp/SML/Haskell class? Which one did they like more? Most importantly, which one was regarded as "more practical"?

Our undergraduate advisors noted that people generally found the C/Assembly class easier. Most CS students also liked that class more. The ones who didn't usually went on to grad school, which serves to further skew the pool of industry programmers in the direction of the detail-oriented.

Programmers in industry who maintain others' code are subjected to an additional set of biases. It takes some time to learn to read code, and reading code is a bottom-up process, especially if the said code is written poorly (now isn't it funny the way that *everyone* works somewhere where the code sucks?). If you're new to the field (like my coworker) and you're still plying the code line-by-line, it wouldn't seem that there'd be any point to breaking functions up at all. What's more is that sometimes people break up functions without adding any atomicity to the structure of their code.

Since programmers are detail-oriented people, writing atomic functions is not something that comes naturally because we do not typically encounter our problems from the top down. This is especially true if we learned to program in a low level, iterative language like C. Writing clean, atomic functions requires you to understand consciously what it is you are trying to do, at a high level.

A tree or a forest?

But many programmers don't consciously know what they're trying to do. They just hack towards a solution and somehow, magically, a working piece of code appears, both to their benefit and to their detriment.

The awesome and frightening conclusion that we gather from this is that it is hard for many of us to write clean code simply because we *can* hack code. Our ability to meticulously design algorithms, to read and hack other people's code (often at a line by line basis), to find needles in haystacks (debugging, anyone?), and perhaps our own personalities --these things are what made us into programmers in the first place. Details matter: above all, the code must work! "We can worry about how messy it is after we ship. ;)"

But these traits predispose us to a mindset where we do not see our problems in the big picture sense that is useful for writing clean/atomic functions, writing and using black boxes, and maintaining a separation of layers.
Indeed, the fact that we're told that we should write "short" functions is indicative of this mindset. Shortness has little semantic meaning. It is a detail, a direct metric, not an overarching goal like "cleanliness".