I listened to a fair amount of Weird Al Yankovic when I was young.
It has been a while, but some things stick with me – most relevant
for this post, the song “I lost on Jeopardy”, which
features Weird Al playing Jeopardy (badly) against “a plumber, and an
architect, both with a Ph.D.” It seems like an appropriate response
to a recent piece in The Chronicle of Higher Education
in a vein that I tend to think of as the “Oh, the humanities!” genre.
The premise of “Big Brains, Small Minds” is
that the humanities uniquely provide students with patience, wisdom,
and an interest in meaningful questions, while “STEM culture” is
ruthlessly pragmatic and unreflective, leading
to “a search for the ‘right’ answer” and a “disordered mind.”
I think our students should read Plato and other philosophers
(I’m more fond of Marcus Aurelius). I also think they should learn
enough about mathematics, science, and engineering to appreciate
how we ask questions. Mathematicians do not spend their days
adding columns of numbers, nor do most scientists
and engineers spend their days carefully heating test tubes or building
concrete walls. The accusation that we “knew all the answers, but we
couldn’t get the questions right” is as ignorant as it is condescending.
What lessons should one take away from such writings? Perhaps the main
thing is that we ought to teach the methods of questioning early in the
curriculum in any area. What do we assume? Why should we believe our
models? What are the limitations of what we know now, and how does that
affect the questions we ask? What are the limitations of what we can know,
whether in practice or in theory, and how do those limits affect the questions
we ask? When is the whole simpler than the sum of the parts, and when is it
more complex?
And, from my seat in a computer science department, this essay has
helped me put a finger on one of the things that has always bothered
me about “computational thinking.” Is computational thinking about
“solving problems, designing systems, and understanding human behavior
that draws on concepts fundamental to computer science?” Of course it is.
But computational thinking is also about the types of questions we ask,
how we think about the difficulty of those questions, and so much more.
When we look at faculty candidates, we judge their taste based on the
questions they pursue. When we mentor students, we emphasize that the
right question makes the difference between an exciting result and a
boring one – good technique is usually necessary to make for interesting
research, but is not sufficient. Beauty matters. Simplicity matters.
Ethics matter. I enjoy problem-solving, but why should I cede the joy
of question framing to the humanities? Questioning should be a joy for
all fields. By starting from the answer, we sell ourselves short.
One of the glorious things about summer is that when 7:00 or so runs around,
I feel like I can safely switch to something other than my day job. For
example, I can spend some time trying Atom! At least, I can spend
a minute or two before it is time to put the kids in the bath.
I would wax poetic about this, I’m sure, but I have only used it for all
of about ten minutes. I am sure that tomorrow I will go back to my normal
mode of operation (emacs and vim both open, with a fountain pen and a pencil
sitting next to a pad of paper by the keyboard). But so far, I like it!
I learned about Julia when it was first
announced in early 2012. Despite knowing and respecting some of the
names behind it, and despite several friends ribbing me about a new
numerical language that shared a name with my daughter, I took a
wait-and-see attitude at first. But I kept an eye on Julia, and I saw
as many positive signs as I saw hyped stories. A community formed,
and people other than the language designers started writing packages.
It picked up the ability to call Python, and with that the ability to
access the marvelous Matplotlib library. And the numerical folks at
MIT had started using it for classes. So by Fall 2013, I was ready to
give Julia a try myself.
This spring, I taught CS 5220: Applications of Parallel
Computers, a
master-level course on high-performance scientific computing. As in
past offerings of the class, I gave a few conventional assignments in
which students were to tune a base serial code that I gave them, then
parallelize the code using MPI or OpenMP. And as in past offerings,
most of the assignments involved standard numerical kernels or
simulation methods. But every time I offer this course, I try to give
at least one assignment with a less classically numerical feel, and at
least one assignment with a different style of parallelism. This
semester, I satisfied both these desiderata by having the students
write a topic mining algorithm in Julia. I figured this
would be a good way to test the claim that it is straightforward to
write code with good performance in Julia. In the end, I think I
figured correctly.
As with the other assignments, I started off by providing the students
with a base code and a data set of Yelp reviews that was
kindly provided to me by David Mimno and Moontae Lee. The students had
two main tasks:
- Replace a nonnegative least squares routine (an active set algorithm)
that I provided with a faster, if less accurate, exponentiated gradient
solver.
- Parallelize the almost-embarrassingly-parallel part of the code
My first surprise came when I started implementing the base code.
I’d done one previous programming exercise in Julia,
when I wrote the HTCondor cluster manager over the winter break;
I figured we would need this for the class anyhow, and that was
a good enough excuse. But this was the first time I had done
any numerical programming in Julia, and I expected to spend a bit
of time stumbling around in the dark before I figured out what
I was doing. As it happened, I wrote the code almost as I would
write Matlab code, and with surprisingly little debugging effort.
The greatest difficulty I had was not with the language, but
with choosing the parameters in the exponentiated gradient algorithm
in order to get acceptable performance. I was sufficiently concerned
about the correctness of my own implementation that I decided to write
the active set routine, even though that was not part of the original
plan. And at the end of writing both routines, cleaning up the parts
that I thought might be confusing, and adding some commentary,
I already felt fairly comfortable with Julia.
My second surprise came before I posted the assignment: my Julia
code processed the Yelp data set in a couple seconds, fast enough that
I was concerned that my students would not be able to get much
parallel speedup. So I added some pointers to a few larger data sets,
as well as some code to process those data sets.
My third surprise came after I posted the assignment: with a little
fiddling, my active set solver was running faster than my own
exponentiated gradient solver (on which I’d admittedly spent less time
and effort), at least when we were trying to learn a small number of
topics. This left me doubly concerned, since the assignment indicated
that it should be easy for the students to beat my performance with the
alternative solver. Fortunately, while a few students did indeed need
reassurances, other students did a better job at tuning the exponentiated
gradient code than I had, and got the expected performance gains.
When the students really engaged with the assignment, they did about
what I thought they might do. Most got the basic algorithm working
and managed at least some parallel speedup, and some managed to
significantly improve the performance of the base code that I
provided. In the process, though, they ran into issues that I
probably would not have stumbled across in quite some time.
In no particular order, the ones that come to mind are:
Documentation.
The biggest complaint among the students was the lack of clear
documentation of some important language features. In particular,
the students were not that happy with the documentation on how Julia
handles parallelism. Several students also looked up the documentation
for the wrong version of the language, which didn’t help matters any.
Also, there are few enough examples online that many students still
felt in the dark after spending some quality time with Google.
Version differences.
I installed the most recent pre-release of Julia 0.3 on the cluster.
Several students used Julia 0.2 on their laptops. The language is
evolving fast enough that there were differences between these releases
that caused some woe. Perhaps the most memorable real difference had
to do with elementwise subtraction of vectors, which could be done
with -
or .-
operators. Ordinary subtraction was deprecated in
Julia 0.3, and at least one group ran into a tremendous slowdown on
the cluster (compared to their laptop) because of the I/O overhead
associated with Julia warning them millions of times that they should
now use .-
instead of -
.
Parallel profiling.
Most of the class settled on the same types of spawn-and-fetch
strategies for parallelizing the main loop that I asked them to tackle.
But none of us were able to get great insight into the parallel
overheads by any mechanism other than careful manual instrumentation
with timers.
JIT overhead.
The first time it invokes a function, Julia uses a Just-In-Time
compiler to compile the routine down to machine code. The JIT
compilation takes some time, and unless one wants to measure that
time, it’s important to run the code twice and time the second
trial. I told the students about this, but many of them ignored me.
That in itself is no surprise; what is more surprising is how much
variance the students discovered in the JIT time. One group in
particular was puzzled at an apparent 2x slowdown in their parallel
code that turned out to be entirely due to the JIT processing their
parallel code more slowly than their serial code.
Vectors and array slices.
For the most part, Julia’s performance conformed to my mental model.
But one student group pointed out something that surprised me. They
saw significant speedups by copying a column of a matrix into a
temporary vector, operating on that temporary vector in place, and
then copying the results back. I always just operated on the column
of the matrix in place. Given that Julia is column-oriented, so
that the storage layout should be the same in both cases, I’m still
a bit confused about why their approach improved performance so
much.
OpenMP and OpenBLAS.
Somewhat to our surprise, setting the OMP_NUM_THREADS
environment
variable significantly changed the performance of some students’
serial codes. The really surprising aspect of this is that it didn’t
seem to matter whether the variable was set to 1 or to 8 (the number
of cores on the cluster nodes). Just setting it to something made
a difference. We compiled Julia to use OpenBLAS, which is capable
of running in multi-threaded mode for serial codes, though that
should be turned off when Julia is running several local worker
processes. But I still don’t understand why it would run more
faster when the number of threads is explicitly set to the number of
threads it ought to choose anyhow (either one or the number of
cores).
I was happy that most of the students were able to use the profiler
and make some sense of its output. Many did not figure out the how to
fix the bottlenecks that the profiler revealed, but as often as not
this was a matter of how they were thinking about the algorithm and
not how they were using the Julia language. The student codes were,
as always, of varying quality; but they mostly submitted code that was
free of major performance gaffes.
Most of the students said nothing about their feelings on Julia in the
assignment reports, though I know from conversations in office hours
that several were frustrated from running into what they saw as
incomplete or undocumented corners of the language. Perhaps I’ll
read more about this when the course evaluations come in.
G: “Daddy, I’m doing an experiment!”
Me: “Really? What are you trying to figure out?”
G: “I want to see whether the washcloth floats when I fill it with bubbles!”
Yup, she got it to float. We’re delighted all around. Keep it up, kid.
We’re entering the time of the year when I get asked for professorial
advice. Of course, I’m still figuring this whole thing out myself,
and would take anything I say with a grain of salt. But for today,
my deepest and most heartfelt advice must be: take dessert seriously.
Yeah, nobody ever asks my opinion on dessert. But that’s my line for
tonight, and I’m sticking to it.
It’s the middle of the fall semester, just before fall break. That
means that I have had just enough time to develop a certain amount of
exasperation with student coding habits. From the Friday afternoon
before a long weekend, I give you this brief, cathartic set of
thoughts.
Programming exercises are katas
Code for a classs supposed to make your brain comfortable going
through certain motions, so that you’re better equipped to go through
those motions when you face a computation in anger. It should not be
a long, mindless slog (though it may be long). If it is, you’re doing
it wrong.
Start early
You don’t have to finish early; you just have to understand what
you’re facing well enough to plan. Do you basically know how to do
the problems? Do you need to read more deeply? Will you want help in
office hours?
If you’re continually late, try tracking how much time you think the
work will take at the outset, and how much time it actually takes.
If you’re surprised by the difference, keep recording it until it sinks in.
Don’t hack mindlessly
Before you start programming your solution, read the prompt. Write a
tester. Include some randomly generated test cases, but don’t rely on
them alone. Document, at least for yourself, what you assume about
the input and output parameters. Look for singularities and
pathologies; are they ruled out by design? By all means, pull up a
MATLAB prompt and play with some ideas early on, but treat those
experiments like the early scrawls you make on a pad. They’re
stepping stones, not a thoughtful solution.
Get code working early
Start with the simple case: serial, unoptimized, trivial code. Use
existing building blocks where possible. Find your mistakes as soon
as possible after you’ve made them. Don’t wait until after you’ve
written the entire program. That way lies madness, or at least bad
karma.
Find a friend
Ideally, find a friend who is going through what you’re going through.
Talk, away from the computer.
Ideally, also find a patient friend who has no idea about computing.
Find a way to tell that friend why your problem is cool or
challenging. If your friend suggests you go get coffee or tea or
something instead, accept – if time permits.
My primary computer is a MacBook Pro. It’s a couple years old now,
and it is still running OS X 10.7, but I’m still quite happy with it.
I’ve been using a Mac laptop as my primary machine for about eight
years now; prior to that, I spent many years using Linux as my main
OS; and prior to that, I used Windows (and at one time DOS). But I
try not to be dogmatic about things, and have been running with at
least two different types of operating systems – sometimes on the
same hardware, sometimes on different machines – since I was an
undergrad.
A few weeks ago, I decided that we needed a family desktop at home,
something that my wife and I could both use for record-keeping and
word processing, and that I could also use as an additional box for
building and testing code. Both because J will be using it and
because I already have access to OS X and Linux boxes elsewhere, I
went with a modest-but-not-puny refurbished HP with Windows 8
pre-installed. I thought about running Ubuntu and using KVM (or some
other virtualization layer) for running Windows, but… meh. I’m sure
I would have learned something from setting things up that way, but it
would take more time than I really wanted to invest to fiddle it into
a shape that I liked. Also, that time would have to come out of the
precious few evening hours that I don’t devote to work or family. So
a Windows 8 box it was on arrival, and a Windows 8 box it has
remained.
And what have I discovered? Well, for one thing, I’ve discovered that
I find the Windows 8 UI intuitive, and the odd bits of friction in the
system seem mostly to do with places where old and new metaphors are
forced to coexist. I will still be happier once I’ve figured out how
to do more things with the keyboard – I’m faster typing than I am
mousing – but I’ve had little trouble figuring out how to do the
things that I want to do. I wouldn’t say that I like it more than
my setup under OS X, nor more than Ubuntu, but it’s not obvious to me
that I like it less either; it’s just a little less familiar.
Now, I’m unlikely to switch from a UNIX-based system as my chief
development platform any time in the immediate future, despite having
a fairly comfortable setup with Cygwin 64 on the new Windows box. But
I like having a Windows box to play with at home. I like to see that
desktop user interfaces are being forced to evolve, and I like to see
that different operating systems are not all going in exactly the same
direction. I also like the idea that I won’t draw a complete blank
when asked to do something with a machine running any one of the three
major desktop OS flavors du jour, and that I can build and test my
software on each of those three systems.
September 18 would have been the 194th birthday of Leon Foucault, a
French physicist most famous for devising a pendulum that demonstrates
the rotation of the Earth. I’ve learned a bit more about this
experiment over the past couple years.
One of the fascinating aspects of Foucault’s pendulum is how many ways
there are to reason about it. Being who I am, I usually think about
this in terms of oscillator coupling. Imagine you are standing in a
room looking at a Foucault pendulum moving back and forth. The motion
you see could be described in terms of two different “modes”: either
you would see the pendulum swing from left to right, or from front to
back. Clearly, this description is somewhat arbitrary, since you
could move to a different position in the same room and thus change
what “left-right” and “front-back” meant to you. That is, the system
is invariant under rotation about its axis. Among other things,
invariance means that the modes of motion (front-back or side-side)
have to generally look the same; same “shape”, same frequency, etc.
When a system has two oscillations at exactly the same frequency,
small perturbations from the outside world (such as gyroscopic forces
due to the rotation of the earth) have the effect of transferring
energy back and forth between these modes of oscillation. This appears
as the precession of the line along which Foucault’s pendulum swings.
The exact same mathematics describe many different systems, each with
different names attached. The hemispherical rate-integrating
gyroscope is essentially a solid-wave version of Foucault’s
pendulum built out of a wine glass, as is sometimes noted by those
still working on such devices. The same thing works when
the oscillations involve light rather than elastic waves, and this is
the basis of ring laser gyroscopes. Physicists sometimes
talk about geometric phase (or Berry phase) to describe
this general type of perturbed oscillation.
I find it fascinating when these same concepts keep appearing in
different parts of the literature under different guises.
I also learned this summer that I spent many years utterly mangling
Foucault’s name.
Here are some lessons I have learned in the first days of this
semester, which I will share with you on this first day of fall.
-
If you must type while a cat is attacking your arm, use vi and not
emacs. Chorded key strokes are murder. Also, long sleeves are
helpful.
-
A collection of letters on the Kindle app on my cell phone makes a
marvelous companion in the event that I show up a early for a
colloquium. Seneca’s letters are more interesting and better
written than anything that shows up in my email, and each one takes
but a few minutes to read – about the same amount of time it takes
to debug the typical A/V problem, or possibly less.
-
Human beings are
awesome throwers.
Also, I am in awe of Indian bakers.
-
I am glad my daughter is getting old enough to appreciate Pixar
movies. We watched Ratatouille as a family this weekend, and G’s
commentary added a certain savor to the movie. Then G fell asleep,
and I savored that, too!
-
Combing curly hair is far easier with a brush than with a comb.
Also, I need to buy some new combs, since all my combs are now
snaggle-toothed from trying to work them through G’s curls before
J asked “why are you doing it that way?”
Nick Trefethen came to
Cornell for a visit yesterday, where he gave
two talks.
In one talk, he described some
computations based on sampling analytic functions around the unit
circle; in the other, he demonstrated the Chebfun system
Chebfun. I’ve seen variants
of the latter talk before, but it was worth seeing again.
The philosophy of the talk reminds me of a line from the start of
The Wind in the Willows, in which the Water Rat tells Mole:
Believe me, my young friend, there is NOTHING – absolutely nothing –
half so much worth doing as simply messing about in boats.
Replace “boats” with “MATLAB”, and I think you have the raison
d’être for Chebfun. This is not to say that Chebfun couldn’t be
used to serious ends, but rather that it’s great fun to mess about
with it; and Trefethen conveys that sense of fun masterfully.