Lazy Sequences In Scheme

2012-05-20 23:24 +0000

Ever since I started working for a Clojure shop a few months ago I came to really like Clojure's built-in lazy sequence support. So much in fact that I found myself missing them when programming Scheme. This led me to create a library for Chicken which resembles Clojure's lazy-seq API. But first a few words about why lazy sequences are cool!

Wholemeal Instead Of Piecemeal

Lazy sequences facilitate a programming style which is sometimes vividly referred to as Wholemeal Programming. This is probably best illustrated by an example. In a mostly imperative language one would usually operate on a list by iterating it and applying the intended manipulations on each element. In JavaScript one could call a function transform on each element and filter it down to results which are greater than 10 like this:

var results = [];

for (var i = 0; i < someList.length; i++) {
    var el = transform(someList[i]);
    if (el > 10) {
       results.push(el);
    }
}

This program is rather complex as it intertwines what it is supposed to do with how that is to be done. In other words: Instead of declaring┬╣ the desired transformation of someList it also explicitly contains incidental minutiae of how to iterate over a list and how to construct the result list. It also has a strong focus on how the individual list items are processed. There is, of course, a nicer way to express this very program even in plain JavaScript:

var result = someList.
    map(transform).
    filter(function(el) { return el > 10; });

This is more succinct because the program now tells us much less about how it works but allows us to focus on what it is doing. Instead of handling each element of the list piecemeal we describe in a wholemeal fashion how we want to transform the list. In Scheme this is pretty much the default way to express such programs.

But it can be said that the above program has the drawback of having to traverse the list twice and building up an intermediary list completely before filtering it down to just elements which are greater than 10. This is very much the same issue the first Scheme implementation in my article on pipelines had. The second implementation fixes this problem but rids us of the nice abstractions provided by map, filter and friends. Now, why I've come to like lazy sequences so much is that they allow wholemeal programming without the memory waste. Since I didn't want to miss out on that when hacking Scheme I have created the lazy-seq egg. With it the pipeline example could be expressed like this:

(use lazy-seq)

(call-with-input-file "access.log"
  (lambda (file)
    (lazy-each
     print
     (lazy-map
      (lambda (x)
        (list-ref x 6))
      (lazy-map
       (lambda (x)
         (string-split x " "))
       (lazy-filter
        (lambda (x)
          (string-contains x "207.46.195.223"))
        (input-port->lazy-seq file read-line)))))))

Note that this program executes in bounded space, the contents of access.log will never completely reside in memory but we can program as if. Behind the scenes each line after the other will flow through the sequence of functions. Incidentally, this kind of code mixes well with another thing I pillaged from Clojure, namely the thrush operator ->> which is available through the clojurian egg:

(use lazy-seq clojurian-syntax)

(call-with-input-file "access.log"
  (lambda (file)
    (->> (input-port->lazy-seq file read-line)
         (lazy-filter
          (lambda (x)
            (string-contains x "207.46.195.223")))
         (lazy-map
          (lambda (x)
            (string-split x " ")))
         (lazy-map
          (lambda (x)
            (list-ref x 6)))
         (lazy-each print))))

Note how the program's literal structure now reflects the sequence of things happening: open the file, read line by line, filter to lines containing "207.46.195.223", split those at space, take the sixth element of the resulting list and finally print each result.

But SRFI 41!

As a Schemer you might ask why I didn't just go ahead and used the existing SRFI 41 library which provides lazy sequences under the name of streams (which is an apt metaphor). There are several reasons:

First of all, I like the way Clojure's lazy-seq form works. Instead of having special stream-lambda, stream-cons and stream-null forms to define streams you can define a lazy sequence in terms of a regular recursive function with its body wrapped in lazy-seq. For example, defining the sequence of multiples of 3 with SRFI 41 looks something like this:

(use srfi-41)

(define multiples-of-three
  (letrec ((next (stream-lambda (f n)
                   (stream-cons n (next f (f n))))))
    (next (lambda (n) (+ n 3)) 3)))

(stream->list (stream-take 10 multiples-of-three))
;; => (3 6 9 12 15 18 21 24 27 30)

This is a bit unwieldy and feels unidiomatic. SRFI 41 provides a high-level procedure called stream-iterate which allows expressing this in a much cleaner way but what I cared about here was the primitive building blocks. Occam's Razor and all that. So here's how that sequence definition looks like using lazy-seq:

(use lazy-seq)

(define multiples-of-three
  (let next ((n 3))
    (lazy-seq
      (cons n (next (+ n 3))))))

(lazy-seq->list (lazy-take 10 multiples-of-three))
;; => (3 6 9 12 15 18 21 24 27 30)

As you can see this is a plain old named let recursion with the body wrapped in lazy-seq to suspend the evaluation. To me this makes the API much more accessible. SRFI 41's insistence on the stream-lambda form means that existing Scheme idioms such as named let can't be used to define streams. To make up for it SRFI 41 even comes with a stream-let to provide the same functionality!

Another reasoned why I didn't just use SRFI 41 was that I wanted to get a better understanding of how lazy sequences work. And what better way is there to learn than to build something yourself?

After I had the basics implemented it turned out there was another good reason: SRFI 41 (at least its Chicken incarnation) is slow. Since the implementation is fairly complex I didn't dig too deep but benchmarks suggest that it generates much more garbage for the GC to collect than lazy-seq does. For a simple demonstration check out this paste of naïve prime number sequence benchmark comparing the two implementations.

There is a problem SRFI 41's predecessor SRFI 40 suffered from which the early stages of lazy-seq's implementation did as well. Search for times3 in the SRFI 41 document for the discussion. Both SRFI 41 and lazy-seq don't have this problem.

Conclusion

Lazy sequences can make code more expressive by reducing the how and focussing on the what even when dealing with potentially large (or even infinite) amounts of data. I encourage everyone to give them a try!


1
This is why this programming style is sometimes also referred to as declarative.