Mathematics I Use

Recently, on an online forum, a question was posed: How much, and what kind, of mathematics does a working programmer actually use? Here is my answer.

First, I and almost all programmers use a lot of boolean logic, from evaluating boolean expressions for conditionals and loop exit criteria, to rearranging the terms of such expressions according to, e.g., De Morgan's laws. Much of our work borders on the first-order predicate calculus and other predicate logics in the guise of analysis of preconditions, invariants, etc (though it may not always be presented as such).

Next, I do a lot of performance analysis. The kind of data sets we process these days are massive. In 2010, Eric Schmidt made a comment at the Techonomy conference that we (humans) produce as much data in two days as ever existed world-wide in 2003. I want to be able to process large chunks of that and infer things from it, and understanding the space and time complexity of the operations we apply to the data is critical to determining whether the computations are even feasible. Further, unlike in much traditional big-O or theta analysis, the constant factors matter very much at that kind of scale: a factor of 2 will not change the asymptotic time complexity of an algorithm, but if it means the difference between running it over 10,000 or 20,000 processors, now we are talking about real resources. The calculations tend to be much more intricate as a result. Examples: can I take some linear computation and reduce it in strength to a logarithmic computation? Can I reduce memory usage by a factor of three? Etc.

Often times, I want to compute the worst-case or upper bound of, say, the size of some data set. The calculations can be nontrivial for many of these. Or, I may want to analyze some recurrence relation to see how it varies as I increase the recursion depth. To do that, I need, among other things, the Master Theorem and a good understanding of how to analyze series. Believe it or not, this sometimes means I need to evaluate an integral (though mostly of the Riemann variety). Or can I just solve the recurrence and get a closed-form solution? Do I have to resort to linear algebra? This gets into things like generating functions, Stirling numbers, matrix computations, etc. If you are curious what goes into “fundamental” mathematical concepts necessary to understand computer science, have a look at volume 1 of, The Art of Computer Programming, by Donald Knuth, or Concrete Mathematics, by Knuth, Ronald L. Graham and Oren Patashnik.

I do a lot of straight-up computation in terms of aggregating, combining and transforming data; lots of combinatorics (e.g., counting things, looking for symmetries across different dimensions, etc). Examples are obvious, I think.

I do a lot of discrete mathematics: looking for algebraic structures across operations on extremely large sets. Is there some kind of structure latent in whatever I am doing that can be preserved across homomorphism to some group or ring that I understand better? Is there a looser constraint? Can I apply group actions to some set in order to build a mental model for some transformation that makes it easier to reason about? Can I define some topology towards analyzing the data? You would be surprised how many things fall into the category of discrete topologies. For that matter, you would be surprised how many places the triangle inequality shows up.

I do a lot of graph theory. “Designing a web site” is not just about where to put the cute cat graphic on the page, it is also about inserting nodes into the global hyperlink graph; a single page potentially adds many edges to the graph and that can have subtle effects on performance, analysis, search engine rankings, etc. Understanding the consequences of that can yield interesting insights: how does the graph grow? As it turns out, it looks an awful lot like it obeys a power law: the web is a scale-free network. What is the shortest distance between two nodes in that graph? What would it mean for the web graph to be planar? Bipartite? When, if ever, do these properties hold? What if the graph isn't the web, but entire road network for North America, Europe or Asia?

This implies something else. Something people often do not realize about “modern” web pages is that they are not just HTML documents with links to images and other resources, but they are really tree structures of data that are linked together in a graph. Those trees are often walked over, reprocessed and updated dynamically by interactions between the user's web browser and some server (this is what “AJAX” is all about). For a clever and relevant example of this, see: MathJax. Or GMail. Understanding how to do that means having some understanding of symbolic computation and semantic analysis of the elements of a page. For MathJax, the authors must write a program that can walk a tree generated from the Document Object Model (or DOM), look for mathematical elements, parse them, and dynamically replace them with new rendered elements representing the underlying expression. It may not seem like much to most users, for whom it “just works”, but it is actually some heady stuff under the hood. I do not do things like that necessarily (I am not a front-end person), but I do do similar things in Lisp all the time. Note that Lisp was originally defined as a mathematical notation for symbolic processing: Lisp macros are all about manipulation of symbolic expressions.

I do a lot of time-series analysis. How is traffic or resource consumption changing? What are the trends? Is a spike in request latency or memory use seasonal? How does the rate of change of something vary as input changes in different dimensions? Is it correlated with some external event?

I do a lot of statistical analysis of data, not just to understand its performance characteristics but also to understand the data itself. In addition to looking at the aforementioned DOM tree for semantic metadata (e.g., microdata and microformats, RDFa, other XML data with some explicit schema, etc) also trying to make sense of unstructured data. What is the probability that this text is a street address? Geographical coordinates? What context does it appear in? Is it spam? Does it make sense? Does it look like the output of a Markov chain generator? Is it a series of exact quotes from some well-known work of literature? Or is it some discussion about literature? Is it a discussion about spam that includes literature? I still chuckle when I think about the piece of spam I got for pharmaceuticals wrapped inside a section of Mikhail Bulgakov's The Master and Margarita.

Category theory. Types in computer programming languages roughly correspond to categories, and monads and functors can be brought to bear to greatly simplify some constructs in surprisingly elegant ways. For instance, the functional programming language Haskell uses monads for input and output, and for modeling state. Simplified programs are easier to get right, easier to reason about, understand, modify, etc. Types can often be inferred; this brings in things like unification (which can also be used in general problems of inference). Consider using inference to apply prolog-style predicates as an approach to transforming graphs in a distributed system.

Distributed systems bring us back to graph theory: at scale, in the real world, systems fail, backhoes cut fiber, there are earthquakes, volcanoes, and fishing trawlers that disturb Marine cables. One needs to understand the graph characteristics of the network to understand the effects of these things and how best to respond. Routing algorithms and network analysis is intimately tied to things like how to find the shortest path between nodes in the network graph; Dijkstra's algorithm anyone?

Also, how does one distribute a large computation across globally distributed data centers? You have to understand some physics to do this well: at Internet scale, the speed of light starts to be a bottleneck. Heat dissipation, density of electrical current draw per unit area, etc, are all real world considerations that go into what programmers do. Should I put a data center in Iceland? Cheap cooling and geothermal energy sound appealing, but what is the minimum latency to some place where user's care abut the data living on those servers in Iceland? That is the great-circle distance between, say, Iceland and London? Berlin? Amsterdam? These are fairly simple things to figure out, but I need to have enough mathematical chops to do them. Can I run fiber from Iceland to some hub location? What is the average latency? What is the probability of a fiber break in a marine cable under the North Sea over a 12 month period? 48 months?

Of course, the theory of computation and automata, parsing, grammars, regular languages, etc, all enter into what programmers' work. I do a lot of parsing and pattern matching. At even moderate size, real-world data sets contain items that can trigger pathologically bad behavior when using, for instance, backtracking techniques. If I use regular expressions to match data, I must be careful to make sure that the expressions really are regular. If I am using a push-down automaton to parse a context-free grammar (which happens every time you send a request to an HTTP server, by the way), I have to make sure I limit the depth of recursion to avoid things like processor procedure call stack exhaustion, which requires understanding the underlying principles of the computation and the mathematics they are based on. If I have to actually write a recursive descent parser for some funky grammar that is not LALR(1) (so I can't just use yacc or bison), I have to be careful or maintain the state stack separately from procedural recursion. That this is also something I need to understand if I am walking a DOM tree (or any recursively defined data structure). Some programming languages recognize this as a hassle for the programmer and work around it by using segmented stacks. Of course, it would be nice if I could define my “compilation” of some parsable resource as a function (in the mathematical sense). Wouldn't it be nice if this was all just some linear programming optimization problem?

Note that none of this is esoterica; it is all based on real-world experience with real-world data sets and problems. Of course, I do not do all of this every day, but I have done all of it at one time or another and most of it regularly. Probably a lot more is based on observation, experience and heuristics than should be (the heuristic models are often incomplete and inaccurate). Do I know enough math to calculate the average error between reality and my heuristic model?

This is what computer science really is, and how it interacts with programming and the realities of modern computing. Being the “IT expert” somewhere is not the same thing as being a computer scientist, and as many correctly note, being a computer scientist is a lot closer to being an applied mathematician than a tradesman. This is not to downplay the importance of trade professions, which are both useful and highly respectable, but to point out that computer science is different. I am very fortunate that I was able to serve in the Marines with a number of folks who work in trades; we as a society should show them a lot more respect than we do.

(For the record, I am not a computer scientist. I was trained as a [pure] mathematician, and what I do professionally is a lot closer to engineering.)