OCaml, Python, Scheme, Tuareg Mode, Camldebug, and Carbon XEmacs

10 May 2007

A discussion of expressiveness and efficiency of programming languages used in my previous projects, my switch to OCaml, and programming in OCaml under Mac OS X.

I’ve been learning the OCaml programming language in the past few days. I’ve known about ML since a colleague gave a talk on it in one of his seminars in the early nineties back when I was still teaching. The idea of “type inference” struck me as something extremely clever. Essentially it means you can still perform strict, static type checking without variable definitions, and generate efficient code. It’s one of those magical and spectacular results of non-linear thinking in computer science. I must have played with it a little, probably in the form of SML or Caml Light, and remember the language as a little “rough on the edges” and the implementations quite experimental. Of course after years of evolution, OCaml is now a very nice language to write programs in and the INRIA implementation feels solid and mature.

Let me step back a little and explain what I am trying to do with OCaml. If you have been following the programs I’ve written :-): the Emacs and XEmacs ports were in C and Lisp; MyJazzBand 1 was in C, C++, and Objective C; TOE was in C and Objective C, with SWIG wrappers so it is callable from Scheme and Python, and other languages; and MJB2Lite was in Python. I’m currently starting to work on version 2 of MyJazzBand and I need a programming language that will allow me to express my automatic composition algorithms easily and succinctly. I took some time this winter to evaluate both Scheme and Python for this purpose: I implemented “jazz theory objects” such as notes, intervals, chords, and scales and a few “harmonic analysis algorithms” in them. At some point I must have decided to go with Python and started to write MJB2Lite.

Python is a nice enough language to write in. It’s simple. It’s reasonably expressive. Lots of programming libraries are available. And many people know it; if they don’t they can learn it quickly. Also, it can be embedded which means I can build extensible applications (so that the list of chords and even algorithms can be customized after the application is shipped, e.g.). The biggest problem with Python however is that it is interpreted. As I was writing MJB2Lite I couldn’t help but think that much of my effort was destined to be wasted on code that would not run as fast as it possibly could. I know about works like PyPy, Shed Skin, and Pyrex, but they don’t seem ready for production use. Even then it may not be enough because of how the Python language is designed. I don’t remember exactly what caused me to reexamine OCaml. But when I did, I was reminded that one doesn’t need to sacrifice performance for expressiveness.

Unfortunately the timing of my realization was horrible because it was around the time MJB2Lite (written in Python) was almost ready for release. So I took the time to finish it and finally released it last week. And now I’ve started to reimplement MJB2Lite in OCaml! I have written about a thousand lines of OCaml. I need to write a few thousand more. OCaml is probably slightly more expressive than Python, but Python makes up for it by having more programming libraries in the official distribution. So the two implementations will probably end up with similar number of lines. I must say I don’t recommend such an approach for everyone: throwing out a perfectly good implementation and rewriting it while learning a new language. But so far OCaml has been a lot of fun to learn and worked great. There is still so much more of it to explore too: like Camlp4 and MetaOCaml. In this sense OCaml is a lot like C++, which I'm sure is something both OCaml and C++ people will find strange to hear someone say.

So this is the lesson I have learned from this (and if you get only one thing from this post this is it):

If you’re starting a software project in Python or Scheme, stop: consider OCaml instead.

To give you a taste, here are the OCaml, C++, Python, and Scheme versions of a function in my Band-in-a-Box reader code (Yes, I have 4 different versions; isn’t that crazy?). It reads bytes from an input file and decodes a sort of run-length encoding they represent into a list of chord type-code and duration pairs (the C++ version only stores these in two arrays).

OCaml:

let read_new_format_chord_types ic =
  let rec loop beat result =
    if beat >= 1020 then
      beat, List.rev result
    else
      let chord_type = read_byte ic in
	if chord_type = 0 then
	  let dur = read_byte ic in
	    match result with
		(chord_type1, dur1) :: tl ->
		  loop (beat + dur) ((chord_type1, dur1 + dur) :: tl)
	      | [] ->
		  failwith "First chord type cannot be 0"
	else
	  loop (beat + 1) ((chord_type, 1) :: result)
  in
    loop 0 []

C++:

void read_new_format_chord_types(
  unsigned int types[1020],
  unsigned int beats[1020],
  unsigned int &beat)
{
  beat = 0;
  while (beat < 1020) {
    unsigned char chordType = getByte(data, i++);
    if (chordType == 0) {
      unsigned char duration = getByte(data, i++);
      beat += duration;
    }
    else {
      types[numberOfChords] = chordType;
      beats[numberOfChords] = beat;
      numberOfChords++;
      beat += 1;
    }
  }
}

Python:

def read_new_format_chord_types(f):
    beat = 0
    result = []
    while beat < 1020:
        chord_type = read_char_as_integer(f)
        if chord_type == 0:
            duration = read_char_as_integer(f)
            beat += duration
            chord_type2, duration2 = result[-1]
            result = result[:-1] + [(chord_type2, duration2 + duration)]
        else:
            beat += 1
            result.append((chord_type, 1))
    return (beat, result)

Scheme:

(define (read-new-format-chord-types p)
  (let loop ((beat 0) (result '()))
    (if (>= beat 1020)
        (cons beat (reverse result))
        (let ((chord-type (read-char-as-integer p)))
          (if (= chord-type 0)
              (let ((duration (read-char-as-integer p)))
                (loop (+ beat duration)
                      (cons (cons (car (car result))
                                  (+ (cdr (car result)) duration))
                            (cdr result))))
              (loop (+ beat 1) (cons (cons chord-type 1) result)))))))

Note how nicely OCaml pattern matching avoids car’s and cdr’s and at the same time serves as control structure. Of course I know any serious comparison of these programming languages will require more than just a simple example. But you just have to trust me when I say OCaml feels right and seems to embody a right balance of everything a programming language should have. If this gives you the urge to learn more about OCaml, the rest of this post consists of a few notes that will help you run it on Mac OS X (this is afterall an “OS X Programming Blog”).

After downloading OCaml from INRIA and before installation, apply this patch for dynamic loading to work under OS X (otherwise #load "nums.cma" will not work in toplevel, e.g.) and this patch for gprof to work on Intel Macs.

Read the official reference manual and on-line book. You may get a quick start by reading the Objective Caml tutorials and tips at the OCaml Alliance Network. But you can’t really avoid reading the manual and the book. When you get into trouble, search for answers in the fa.caml USENET newsgroup archives. Look for additional programming libraries you can use at the Caml Hump.

If you’re like me, you’ll use Emacs/XEmacs to edit OCaml code. I use Carbon XEmacs, which I think of as my own private implementation :-). Don't use regular caml-mode that comes with many Emacs/XEmacs distributions (like the XEmacs sumo package). Use the Tuareg mode instead. Follow the instructions in the README file to install it. Tuareg mode uses (the somewhat archaic) sym-lock.el (included in the distribution), which calls a function named list-fonts. This isn't defined in Carbon XEmacs. So define it in your startup file:

(defun list-fonts (pattern)
  (font-list pattern))

You’ll also need to customize the “Camldebug Event” and “Camldebug Underline” faces for face coloring in camldebug mode to work correctly. You can of course also customize a number of faces that begin with the words “Font Lock” or the word “Tuareg” to control syntax coloring. Here’s a screenshot of the camldebug debugger stopped within a function adjust_for_meter in Tuareg mode. A great feature of camldebug is you can step backward as well as forward in time. Great stuff!

camldebug.jpg

Install the otags program to generate “tags” files from OCaml files for Emacs. Tags allow you to jump quickly among definitions of functions and variables in different files of a project. It is also used in text completion of function and variable names.

As an aside, to display source code in OCaml and other languages in a Wordpress blog (like I’ve done above), use the WP-Syntax plugin, which is based on GeSHi: the Generic Syntax Highlighter. It’s a little strange to be colorizing code on the server, but highlight and source-highlight don’t work as well because they generate color information in CSS which needs to be included in the post. I don't know an easy way to do that.

Well, that just about wraps it up. Look for MJB2Lite rewritten in OCaml soon!

Category: Programming