An Introduction To Lispy Pattern Matching
2013-04-19 07:20 +0000
It is evident that Lispers like to keep their data in lists. In fact, they like lists so much that they even keep their code in lists! And sometimes they use lists when they shouldn't, e.g. when they really should be using maps or sets. But lists are just so much more convenient to program with in Lisp, and unsurprisingly so given that the name originally stands for list processing. So who could blame them?
Now, by established tradition Lispers pick apart lists by means of the venerable car and cdr functions, interspersed with the occasional cadr or cadadr. For example, imagine we have a list of number sequences like that:
(define number-seqs '((0 3 4 8) (4 9 0 3) (3 7 3 1) (6 7 9 9) (7 6 6 2)))
And it is our task to find the first number sequence in that list which starts with an odd number because, well, it's very important: The customer has been emailing hourly since earlier this morning, claiming they are losing money every minute they're waiting for this critical information. Of course, all mails are labelled URGENT and contain the letters ASAP in various places. In this stressful situation we forget what the library function to accomplish this task is called so we just roll our own using good old car and cdr, peppered with some recursion:
(define (find-odd-car lst) (and (pair? lst) (if (and (pair? (car lst)) (odd? (caar lst))) (car lst) (find-odd-car (cdr lst)))))
Phew, let's hope it works:
(find-odd-car number-seqs) => (3 7 3 1)
Looks alright. Once again 1950s technology saves the day!
After finally getting our well deserved morning coffee we remember that we could have used find from SRFI 1 which—albeit not from the 1950s—is a good tool for the job:
(find (compose odd? car) number-seqs) => (3 7 3 1)
Still, it's quite common for Lispers to write functions like find-odd-car and it's a good and fun way to go about such problems. However, as you might remember from my previous article on lazy sequences, one might argue that this approach conflates how we walk the list with what we are intending to do. It would be nice to untangle those so that we as programmers can focus on the what while leaving more of the how to the system. In other words: We want our code to be more declarative.
One way to achieve a more declarative style for this kind of task is to use a technique called pattern matching. Some languages such as Haskell provide those out of the box. R⁵RS Scheme comes with a standard pattern matching facility called syntax-rules but this can only be used for syntactic extension. But thankfully, being a Lisp it can easily be extended with a general purpose pattern matching form by means of—well, syntactic extension! In the following we will look at one of those extensions by Alex Shinn which is an implementation of Andrew Wright's pattern matching system. It can be found in Chicken's egg repository under the name matchable and Guile ships it as the (ice-9 match) module.
Let The Match Begin
Predictably the main matching form is called match. It accepts one argument which is the expression to be matched, followed by one or more clauses which consist of a pattern and a body to be evaluated when its pattern matches. In the simplest form it works very much like the standard case form:
(match 2 (1 'one) (2 'two) (3 'three)) => two
match tries to match the given expression against the patterns in the order they are specified. If a pattern matches, the matching process stops and the body of the matching pattern's clause (i.e. its right-hand side) is evaluated, returning its last expression's value. If no pattern matches an error is signalled.
Patterns are much more versatile than matching against simple constant values, though. They can also describe compound data structures such as lists or vectors:
(match '(1 2 3) ((1 2 3) 'yes) (else 'no)) => yes
No surprises there except maybe for the symbol else we use as the second clause's pattern. This is no special syntactic symbol in match but acts just like any other unquoted symbol in a pattern: It will match any value and bind it to a variable in the scope of the clause's body. In the example above else is only used for aesthetic reasons and its binding is never used. Here's an example of actually using a pattern variable:
(match '(1 2 3) ((1 x 3) (+ x x))) => 4
This pattern requires the expression to be a three element list with a first element of 1 and third element of 3. The second element can be any value and is bound to the local variable x. If we wanted to match the literal list (1 x 3) we'd have to quote x in the pattern like that:
(match '(1 x 3) ((1 'x 3) "it's a match!")) => "it's a match!"
Returning to our original problem we could now express find-odd-car like this:
(define (find-odd-car lst) (match lst (() #f) ((head . tail) (if (odd? (car head)) head (find-odd-car tail)))))
Granted, it's a few lines longer and we're still using car but it's already a bit clearer what we are up to as we just describe the structure we want to examine rather than writing down how to pick it apart in detail. Note that we are matching on the sole argument of find-odd-car. Since this is such a common thing, the library provides a match-lambda form which creates a function to do just that. Using it we can write the find based solution to the find-odd-car problem like this:
(find (match-lambda ((head . tail) (odd? head))) number-seqs)
Often lists we want to match on don't have a fixed but a variable amount of values. For this purpose there is the ellipsis operator '...' which matches zero or more values fitting the preceding pattern. Let's extend our pattern (1 x 3) from above so that there may be an arbitrary amount of 1s before the x:
(match '(1 1 1 1 1 2 3) ((1 ... x 3) (+ x x))) => 4
We can also match variables like this which will collect all matches in a list:
(match '(1 2 2 2 2 2 3) ((1 x ... 3) (apply + x))) => 10
This works with compund structures, too:
(match '((1 2) (3 4) (5 6)) (((a b) ...) (list a b))) => ((1 3 5) (2 4 6))
Or And Not
match provides a set of boolean pattern operators which allow us to match the inverse of a pattern (not), specify alternatives (or), or match the same expression against multiple patterns (and). The most straight forward one of those is or:
(match 2 ((or 1 3 5) 'odd) ((or 0 2 4) 'even)) => even
This should immediately make sense.¹ The astute reader might notice that this pattern is ambiguous given the rule about unquoted symbols I stated above. This is because match uses unquoted symbols for its own pattern operators, thus effectively reserving them.² See the reference for a complete list of those reserved symbols.
The basic use of the and operator should be quite obvious, too:
(match '(1 2) ((and (a 2) (1 b)) (list a b))) => (1 2)
A non-obvious but very practical use for and is to do a complex match on a compound structure (as we did earlier) but also bind the whole structure to a variable at the same time:
(match '((1 1 1) (2 2)) ((and ((1 ...) (2 ...)) the-list) the-list)) => ((1 1 1) (2 2))
This works because the symbol the-list always matches, of course.
Finally, the not operator can be used to invert patterns which can be convenient at times although I would generally recommend to express patterns in positive terms, i.e. which structure they describe rather than which they don't. Nevertheless it's a good tool to have when you need a shortcut:
(match 2 ((not 1) 'yep) (else 'nope)) => yep
Note that not doesn't bind variables which appear in its argument patterns. This is not too surprising: Binding to something that isn't there doesn't make much sense after all.
Hooking Into The Matching Process
Whenever you reach the limits of what can be expressed with the builtin match operators there are (at least) two ways to hook into the matching process. One of them is the predicate match operator ? which allows embedding an arbitrary predicate function in a pattern. The pattern then only matches when that predicate returns #t:
(match 0 ((? positive?) 'pos) ((? zero?) 'zero) ((? even?) 'even)) => zero
Conveniently, the ? operator is an implicit and operator, so instead of:
(and (? foo?) x)
we can instead write:
(? foo? x)
This provides sufficient flexibility for most cases. For those rare cases where even predicate patterns don't cut it, there's another hook: You can jump back into the matching process from inside a match clause's body. This can be done by means of the (=> identifier) form which goes between a clause's pattern and its body. It binds identifier (which is a symbol of your choice) to a a function of zero arguments which you can call for continuing the matching at the next clause:
(match '(1 2) ((a b) (=> continue) (if (< a 2) (continue) 'first)) ((a b) 'second)) => second
This use of match is starting to become obscure: Even though the two clauses' patterns are exactly the same, the first one doesn't match. Readers of this code will have to carefully examine the whole clause in order to understand this.³
Putting It All Together
Using a few of the tricks we've learned we can now rewrite find-odd-car like this:
(define find-odd-car (match-lambda (() #f) (((and ((? odd?) _ ...) head) tail ...) head) ((head tail ...) (find-odd-car tail))))
The special pattern variable _ can be used for matches we are not interested in, and it will not get bound in the clause's body. While this version of find-odd-car uses the capabilities of match to their fullest extent I think it is not exactly the clearest. As with many things, purity does not necessarily lead to the best solutions!
I hope this article serves well as an introduction to match. It is one of my favorite Scheme forms and it is very popular in general, too: According to the current Salmonella reverse dependency ranking, the matchable egg is the most depended on egg in all of Chicken!
Finally, I can only recommend you to check out Andrew Wright's paper to learn about what is possible with match beyond what I presented here.
- Note how this match use is now fully equivalent to case, at least semantically. case can be optimized much easier by a Scheme system though because it only allows dispatching on constant values while match allows arbitrary patterns as alternatives. A match implementation however could be implemented so that it expands into an equivalent case form if possible.
- The same is true for '...', of course, which is just a regular symbol, as well.
- This example could be expressed using a predicate pattern, too, which would make it much clearer.