I lost on Jeopardy

I listened to a fair amount of Weird Al Yankovic when I was young. It has been a while, but some things stick with me – most relevant for this post, the song “I lost on Jeopardy”, which features Weird Al playing Jeopardy (badly) against “a plumber, and an architect, both with a Ph.D.” It seems like an appropriate response to a recent piece in The Chronicle of Higher Education in a vein that I tend to think of as the “Oh, the humanities!” genre. The premise of “Big Brains, Small Minds” is that the humanities uniquely provide students with patience, wisdom, and an interest in meaningful questions, while “STEM culture” is ruthlessly pragmatic and unreflective, leading to “a search for the ‘right’ answer” and a “disordered mind.”

I think our students should read Plato and other philosophers (I’m more fond of Marcus Aurelius). I also think they should learn enough about mathematics, science, and engineering to appreciate how we ask questions. Mathematicians do not spend their days adding columns of numbers, nor do most scientists and engineers spend their days carefully heating test tubes or building concrete walls. The accusation that we “knew all the answers, but we couldn’t get the questions right” is as ignorant as it is condescending.

What lessons should one take away from such writings? Perhaps the main thing is that we ought to teach the methods of questioning early in the curriculum in any area. What do we assume? Why should we believe our models? What are the limitations of what we know now, and how does that affect the questions we ask? What are the limitations of what we can know, whether in practice or in theory, and how do those limits affect the questions we ask? When is the whole simpler than the sum of the parts, and when is it more complex?

And, from my seat in a computer science department, this essay has helped me put a finger on one of the things that has always bothered me about “computational thinking.” Is computational thinking about “solving problems, designing systems, and understanding human behavior that draws on concepts fundamental to computer science?” Of course it is. But computational thinking is also about the types of questions we ask, how we think about the difficulty of those questions, and so much more. When we look at faculty candidates, we judge their taste based on the questions they pursue. When we mentor students, we emphasize that the right question makes the difference between an exciting result and a boring one – good technique is usually necessary to make for interesting research, but is not sufficient. Beauty matters. Simplicity matters. Ethics matter. I enjoy problem-solving, but why should I cede the joy of question framing to the humanities? Questioning should be a joy for all fields. By starting from the answer, we sell ourselves short.

Atom

One of the glorious things about summer is that when 7:00 or so runs around, I feel like I can safely switch to something other than my day job. For example, I can spend some time trying Atom! At least, I can spend a minute or two before it is time to put the kids in the bath.

I would wax poetic about this, I’m sure, but I have only used it for all of about ten minutes. I am sure that tomorrow I will go back to my normal mode of operation (emacs and vim both open, with a fountain pen and a pencil sitting next to a pad of paper by the keyboard). But so far, I like it!

Topics in Julia

I learned about Julia when it was first announced in early 2012. Despite knowing and respecting some of the names behind it, and despite several friends ribbing me about a new numerical language that shared a name with my daughter, I took a wait-and-see attitude at first. But I kept an eye on Julia, and I saw as many positive signs as I saw hyped stories. A community formed, and people other than the language designers started writing packages. It picked up the ability to call Python, and with that the ability to access the marvelous Matplotlib library. And the numerical folks at MIT had started using it for classes. So by Fall 2013, I was ready to give Julia a try myself.

This spring, I taught CS 5220: Applications of Parallel Computers, a master-level course on high-performance scientific computing. As in past offerings of the class, I gave a few conventional assignments in which students were to tune a base serial code that I gave them, then parallelize the code using MPI or OpenMP. And as in past offerings, most of the assignments involved standard numerical kernels or simulation methods. But every time I offer this course, I try to give at least one assignment with a less classically numerical feel, and at least one assignment with a different style of parallelism. This semester, I satisfied both these desiderata by having the students write a topic mining algorithm in Julia. I figured this would be a good way to test the claim that it is straightforward to write code with good performance in Julia. In the end, I think I figured correctly.

As with the other assignments, I started off by providing the students with a base code and a data set of Yelp reviews that was kindly provided to me by David Mimno and Moontae Lee. The students had two main tasks:

  1. Replace a nonnegative least squares routine (an active set algorithm) that I provided with a faster, if less accurate, exponentiated gradient solver.
  2. Parallelize the almost-embarrassingly-parallel part of the code

My first surprise came when I started implementing the base code. I’d done one previous programming exercise in Julia, when I wrote the HTCondor cluster manager over the winter break; I figured we would need this for the class anyhow, and that was a good enough excuse. But this was the first time I had done any numerical programming in Julia, and I expected to spend a bit of time stumbling around in the dark before I figured out what I was doing. As it happened, I wrote the code almost as I would write Matlab code, and with surprisingly little debugging effort. The greatest difficulty I had was not with the language, but with choosing the parameters in the exponentiated gradient algorithm in order to get acceptable performance. I was sufficiently concerned about the correctness of my own implementation that I decided to write the active set routine, even though that was not part of the original plan. And at the end of writing both routines, cleaning up the parts that I thought might be confusing, and adding some commentary, I already felt fairly comfortable with Julia.

My second surprise came before I posted the assignment: my Julia code processed the Yelp data set in a couple seconds, fast enough that I was concerned that my students would not be able to get much parallel speedup. So I added some pointers to a few larger data sets, as well as some code to process those data sets.

My third surprise came after I posted the assignment: with a little fiddling, my active set solver was running faster than my own exponentiated gradient solver (on which I’d admittedly spent less time and effort), at least when we were trying to learn a small number of topics. This left me doubly concerned, since the assignment indicated that it should be easy for the students to beat my performance with the alternative solver. Fortunately, while a few students did indeed need reassurances, other students did a better job at tuning the exponentiated gradient code than I had, and got the expected performance gains.

When the students really engaged with the assignment, they did about what I thought they might do. Most got the basic algorithm working and managed at least some parallel speedup, and some managed to significantly improve the performance of the base code that I provided. In the process, though, they ran into issues that I probably would not have stumbled across in quite some time. In no particular order, the ones that come to mind are:

Documentation. The biggest complaint among the students was the lack of clear documentation of some important language features. In particular, the students were not that happy with the documentation on how Julia handles parallelism. Several students also looked up the documentation for the wrong version of the language, which didn’t help matters any. Also, there are few enough examples online that many students still felt in the dark after spending some quality time with Google.

Version differences. I installed the most recent pre-release of Julia 0.3 on the cluster. Several students used Julia 0.2 on their laptops. The language is evolving fast enough that there were differences between these releases that caused some woe. Perhaps the most memorable real difference had to do with elementwise subtraction of vectors, which could be done with - or .- operators. Ordinary subtraction was deprecated in Julia 0.3, and at least one group ran into a tremendous slowdown on the cluster (compared to their laptop) because of the I/O overhead associated with Julia warning them millions of times that they should now use .- instead of -.

Parallel profiling. Most of the class settled on the same types of spawn-and-fetch strategies for parallelizing the main loop that I asked them to tackle. But none of us were able to get great insight into the parallel overheads by any mechanism other than careful manual instrumentation with timers.

JIT overhead. The first time it invokes a function, Julia uses a Just-In-Time compiler to compile the routine down to machine code. The JIT compilation takes some time, and unless one wants to measure that time, it’s important to run the code twice and time the second trial. I told the students about this, but many of them ignored me. That in itself is no surprise; what is more surprising is how much variance the students discovered in the JIT time. One group in particular was puzzled at an apparent 2x slowdown in their parallel code that turned out to be entirely due to the JIT processing their parallel code more slowly than their serial code.

Vectors and array slices. For the most part, Julia’s performance conformed to my mental model. But one student group pointed out something that surprised me. They saw significant speedups by copying a column of a matrix into a temporary vector, operating on that temporary vector in place, and then copying the results back. I always just operated on the column of the matrix in place. Given that Julia is column-oriented, so that the storage layout should be the same in both cases, I’m still a bit confused about why their approach improved performance so much.

OpenMP and OpenBLAS. Somewhat to our surprise, setting the OMP_NUM_THREADS environment variable significantly changed the performance of some students’ serial codes. The really surprising aspect of this is that it didn’t seem to matter whether the variable was set to 1 or to 8 (the number of cores on the cluster nodes). Just setting it to something made a difference. We compiled Julia to use OpenBLAS, which is capable of running in multi-threaded mode for serial codes, though that should be turned off when Julia is running several local worker processes. But I still don’t understand why it would run more faster when the number of threads is explicitly set to the number of threads it ought to choose anyhow (either one or the number of cores).

I was happy that most of the students were able to use the profiler and make some sense of its output. Many did not figure out the how to fix the bottlenecks that the profiler revealed, but as often as not this was a matter of how they were thinking about the algorithm and not how they were using the Julia language. The student codes were, as always, of varying quality; but they mostly submitted code that was free of major performance gaffes.

Most of the students said nothing about their feelings on Julia in the assignment reports, though I know from conversations in office hours that several were frustrated from running into what they saw as incomplete or undocumented corners of the language. Perhaps I’ll read more about this when the course evaluations come in.

Experiment

G: “Daddy, I’m doing an experiment!”

Me: “Really? What are you trying to figure out?”

G: “I want to see whether the washcloth floats when I fill it with bubbles!”

Yup, she got it to float. We’re delighted all around. Keep it up, kid.

What advice?

We’re entering the time of the year when I get asked for professorial advice. Of course, I’m still figuring this whole thing out myself, and would take anything I say with a grain of salt. But for today, my deepest and most heartfelt advice must be: take dessert seriously.

Yeah, nobody ever asks my opinion on dessert. But that’s my line for tonight, and I’m sticking to it.

Coding katas for computation classes

It’s the middle of the fall semester, just before fall break. That means that I have had just enough time to develop a certain amount of exasperation with student coding habits. From the Friday afternoon before a long weekend, I give you this brief, cathartic set of thoughts.

Programming exercises are katas

Code for a classs supposed to make your brain comfortable going through certain motions, so that you’re better equipped to go through those motions when you face a computation in anger. It should not be a long, mindless slog (though it may be long). If it is, you’re doing it wrong.

Start early

You don’t have to finish early; you just have to understand what you’re facing well enough to plan. Do you basically know how to do the problems? Do you need to read more deeply? Will you want help in office hours?

If you’re continually late, try tracking how much time you think the work will take at the outset, and how much time it actually takes. If you’re surprised by the difference, keep recording it until it sinks in.

Don’t hack mindlessly

Before you start programming your solution, read the prompt. Write a tester. Include some randomly generated test cases, but don’t rely on them alone. Document, at least for yourself, what you assume about the input and output parameters. Look for singularities and pathologies; are they ruled out by design? By all means, pull up a MATLAB prompt and play with some ideas early on, but treat those experiments like the early scrawls you make on a pad. They’re stepping stones, not a thoughtful solution.

Get code working early

Start with the simple case: serial, unoptimized, trivial code. Use existing building blocks where possible. Find your mistakes as soon as possible after you’ve made them. Don’t wait until after you’ve written the entire program. That way lies madness, or at least bad karma.

Find a friend

Ideally, find a friend who is going through what you’re going through. Talk, away from the computer.

Ideally, also find a patient friend who has no idea about computing. Find a way to tell that friend why your problem is cool or challenging. If your friend suggests you go get coffee or tea or something instead, accept – if time permits.

Windows

My primary computer is a MacBook Pro. It’s a couple years old now, and it is still running OS X 10.7, but I’m still quite happy with it. I’ve been using a Mac laptop as my primary machine for about eight years now; prior to that, I spent many years using Linux as my main OS; and prior to that, I used Windows (and at one time DOS). But I try not to be dogmatic about things, and have been running with at least two different types of operating systems – sometimes on the same hardware, sometimes on different machines – since I was an undergrad.

A few weeks ago, I decided that we needed a family desktop at home, something that my wife and I could both use for record-keeping and word processing, and that I could also use as an additional box for building and testing code. Both because J will be using it and because I already have access to OS X and Linux boxes elsewhere, I went with a modest-but-not-puny refurbished HP with Windows 8 pre-installed. I thought about running Ubuntu and using KVM (or some other virtualization layer) for running Windows, but… meh. I’m sure I would have learned something from setting things up that way, but it would take more time than I really wanted to invest to fiddle it into a shape that I liked. Also, that time would have to come out of the precious few evening hours that I don’t devote to work or family. So a Windows 8 box it was on arrival, and a Windows 8 box it has remained.

And what have I discovered? Well, for one thing, I’ve discovered that I find the Windows 8 UI intuitive, and the odd bits of friction in the system seem mostly to do with places where old and new metaphors are forced to coexist. I will still be happier once I’ve figured out how to do more things with the keyboard – I’m faster typing than I am mousing – but I’ve had little trouble figuring out how to do the things that I want to do. I wouldn’t say that I like it more than my setup under OS X, nor more than Ubuntu, but it’s not obvious to me that I like it less either; it’s just a little less familiar.

Now, I’m unlikely to switch from a UNIX-based system as my chief development platform any time in the immediate future, despite having a fairly comfortable setup with Cygwin 64 on the new Windows box. But I like having a Windows box to play with at home. I like to see that desktop user interfaces are being forced to evolve, and I like to see that different operating systems are not all going in exactly the same direction. I also like the idea that I won’t draw a complete blank when asked to do something with a machine running any one of the three major desktop OS flavors du jour, and that I can build and test my software on each of those three systems.

Foucault doodle

September 18 would have been the 194th birthday of Leon Foucault, a French physicist most famous for devising a pendulum that demonstrates the rotation of the Earth. I’ve learned a bit more about this experiment over the past couple years.

One of the fascinating aspects of Foucault’s pendulum is how many ways there are to reason about it. Being who I am, I usually think about this in terms of oscillator coupling. Imagine you are standing in a room looking at a Foucault pendulum moving back and forth. The motion you see could be described in terms of two different “modes”: either you would see the pendulum swing from left to right, or from front to back. Clearly, this description is somewhat arbitrary, since you could move to a different position in the same room and thus change what “left-right” and “front-back” meant to you. That is, the system is invariant under rotation about its axis. Among other things, invariance means that the modes of motion (front-back or side-side) have to generally look the same; same “shape”, same frequency, etc. When a system has two oscillations at exactly the same frequency, small perturbations from the outside world (such as gyroscopic forces due to the rotation of the earth) have the effect of transferring energy back and forth between these modes of oscillation. This appears as the precession of the line along which Foucault’s pendulum swings.

The exact same mathematics describe many different systems, each with different names attached. The hemispherical rate-integrating gyroscope is essentially a solid-wave version of Foucault’s pendulum built out of a wine glass, as is sometimes noted by those still working on such devices. The same thing works when the oscillations involve light rather than elastic waves, and this is the basis of ring laser gyroscopes. Physicists sometimes talk about geometric phase (or Berry phase) to describe this general type of perturbed oscillation.

I find it fascinating when these same concepts keep appearing in different parts of the literature under different guises.

I also learned this summer that I spent many years utterly mangling Foucault’s name.

First day of fall

Here are some lessons I have learned in the first days of this semester, which I will share with you on this first day of fall.

  1. If you must type while a cat is attacking your arm, use vi and not emacs. Chorded key strokes are murder. Also, long sleeves are helpful.

  2. A collection of letters on the Kindle app on my cell phone makes a marvelous companion in the event that I show up a early for a colloquium. Seneca’s letters are more interesting and better written than anything that shows up in my email, and each one takes but a few minutes to read – about the same amount of time it takes to debug the typical A/V problem, or possibly less.

  3. Human beings are awesome throwers. Also, I am in awe of Indian bakers.

  4. I am glad my daughter is getting old enough to appreciate Pixar movies. We watched Ratatouille as a family this weekend, and G’s commentary added a certain savor to the movie. Then G fell asleep, and I savored that, too!

  5. Combing curly hair is far easier with a brush than with a comb. Also, I need to buy some new combs, since all my combs are now snaggle-toothed from trying to work them through G’s curls before J asked “why are you doing it that way?”

Messing about in MATLAB

Nick Trefethen came to Cornell for a visit yesterday, where he gave two talks. In one talk, he described some computations based on sampling analytic functions around the unit circle; in the other, he demonstrated the Chebfun system Chebfun. I’ve seen variants of the latter talk before, but it was worth seeing again. The philosophy of the talk reminds me of a line from the start of The Wind in the Willows, in which the Water Rat tells Mole:

Believe me, my young friend, there is NOTHING – absolutely nothing – half so much worth doing as simply messing about in boats.

Replace “boats” with “MATLAB”, and I think you have the raison d’être for Chebfun. This is not to say that Chebfun couldn’t be used to serious ends, but rather that it’s great fun to mess about with it; and Trefethen conveys that sense of fun masterfully.

Fork me on GitHub