A big part of my job is listening to others’ ideas, asking good questions, and helping them to focus on the parts that will most benefit from more work. Sometimes that’s the strongest parts, which will escape whatever concrete frame first held them. Sometimes that’s the weakest parts, which are holding back an otherwise good idea. Either way, I rarely have as much time as I’d like to think through these ideas. Ideally, I’d reconstruct the proposal from scratch. Then I’d be able to talk not only about the results, but about the process of getting there. But how often is there time for that? As Rickover1 is said to have said, a manager doesn’t necessarily need the math to do the work, but he’d better be able to tell whether his staff have done the math.
If I have to give a quick answer, I can reach in two directions. First, I can look at a store of knowledge of previous attempts to answer the same question. For example, the papers from an introductory systems course in most computer science programs should be a good start. A few decades of lab notebooks kept with an intent to make new and different mistakes can make a big contribution.
Alternately, I can look at a store of approaches. One name for such a catalog of approaches to problems is mental models. This list, inspired by a talk by Charlie Munger, was shown to me by Chris Degni. He carries it around as part of an explicit practice: when shown a new problem, try some of the stable of models. Especially try some that you haven’t looked at lately, to stay fresh.
But that list doesn’t have some of the approaches I find most valuable. I notice it does have some great approaches. Many of them have the significant virtue that they are easy to learn. It’s pretty easy to hear about Confirmation Bias, the second element on the list, and then to look for it in your work. You’ll find plenty, and that’s it’s own reward. But the approaches I most value are sort of ornery. It’s not always obvious how to apply them to a situation. They reward this diligence with nearly universal applicability. Here they are:
Shannon information theory, particularly the idea of channel capacity. Nothing’s as wonderful for finding the flaws in a computer system vendor’s sales pitch as some estimates of the channel capacity needed to make it work at scale.
Noether’s Theorem, and its consequences: energy and momentum are conserved, and so are lots of other things. Sometimes it’s helpful to look for what’s conserved in a system—where are the continuous symmetries, where are dissipations into entropy—but very often it’s enough to remember some basic conservation laws, and to look for claims that the system will violate those.
Thermodynamics: You can’t win, you can’t break even, you can’t even quit the game. Systems that claim to be able to do this—particularly to avoid all interactions that will have a particular result, while sustaining other interactions—have a lot of explaining to do.
Gaussian surfaces: This is an idea from basic electromagnetism. For many systems, we care about the flux of a vector field. This is often even true for discrete systems, like distributed computer systems (say, for information leakage, or for defining principals). We can then draw the surface in whatever way makes it easy for us to do the rest of our work. We don’t have to feel constrained by tightness to the implementing system.
Retrograde analysis: I learned this together with cryptographic protocol analysis. It’s common for a programmer to think about a protocol in a forwards direction: at each step, what happens next? That’s absolutely the right model for a programmer working on that protocol endpoint, who has to ensure that the right thing happens next. But as someone building a system relying on a protocol, it’s much more common that we’ve experienced one side of a protocol, and now want to be sure of what must have happened elsewhere in the world.
Why else do we use cryptographic protocols in distributed systems, than to know what happened elsewhere, given some assumptions? This idea of proofs at the end of the interaction about what must or must not have happened before hand, given assumptions about regular behavior by others, unlocks new ways of understanding all distributed systems.
If you pave it, you know it’s flat. What do we do with malware-infected computers? Wipe the drives. If the malware even might have changed state elsewhere? Trash them; computers are cheap relative to the costs of keeping malware with you. The same applies elsewhere:
calloc(2)
has quite reasonable costs compared to Heartbleed.Related ideas include written records not changing, and the desirability of having written the history books.
Lots more about systems engineering, including Kerckhoff’s Law—nothing as complex as a system can stay secret; we can only keep short strings secret—and hierarchical control, including the application of Gaussian surfaces to system definitions.
I could write an entire post on each of these, and perhaps I will. In the meantime, I’m happy to answer questions, to accept nominations for new members of my list, and to expand in person.
Missing cite. Did Feynman tell this story?↩︎