Research : General : Putting the Science Back in Computer Science

Putting the Science Back in Computer Science (PDF)

PDF of slides for a talk by Sedgewick on putting the science back in Computer Science.

The content examines the distance between the theoretical analysis done by computer scientists, and the application of algorithms in the real world. Looking at a sample problem of finding a path in a graph as an example, the talk examines the common approaches (DFS, BFS, and Union-Find), pointing out common pitfalls that implementers stumble into by making seemingly reasonable assumptions. As an example, in a DFS search the efficiency of finding a path between two vertices can easily be thwarted by someone accidentally ordering the list of edges provided as input in such a way that the DFS takes the wrong path at almost every step. Sedgewick suggests routing around the problem by using a randomized iterator, as the algorithm implementor typically doesn’t have control over how the edges are handed in.

The more general point of the paper is that users who are interested in efficiency can’t afford to ignore experimentation when researching the best option for their implementation. This point happens to coincide with my own experience in researching optimal algorithms – a theoretically faster algorithm can easily end up being less performant when tested (cache thrashing can kill any algorithm.)

Latter portions of the talk focus on pushing Computer Science into the regular curriculum for all university students. Princeton has been using this approach and now has about 40% of students taking their Introduction to Computer Science course, which was designed specifically to be interdisciplinary.

Some other observations made (in or about the paper): - Analytic Combinatorics as a method of algorithmic analysis. Sedgewick helped to write a book about this. In general, many combinatoric (and thus algorithmic) problems can be solved via generating functions, which in turn can be analyzed via complex analysis. The examples Sedgewick gives on this seem to match that general approach.

  • $\log{\log{n}}$ is almost always smaller than 10 for real-world $n$.
  • Hypothesis: the general running time for an algorithm is $ab^NN^c(\log{N})^d$, where some of those variables might be 1.
  • Approximation notation ($\sim 3n^2$) is more useful than big-O notation.
  • Fundamentals of Computer Science should be taught to all university students; the concepts of computation are useful in almost every professional field.
  • Analyzing average performance of an algorithm using current techniques requires a firm foundation in various aspects of upper-division mathematics.