Ruby and The Principle of Unwelcome Surprise

2011-10-05 15:09 +0000

As I have mentioned before I once was a Rubyist and I still work with Ruby on a nearly daily basis. But ever since I switched to Lisp as my main programming language for new code I keep noticing things about Ruby that I didn't notice before. Needless to say, most of these things have to do with syntax. Ruby is well known for its readability or rather for its naturalness and in fact I agree to some extent. It's certainly less verbose than many other languages and has great flexibility of expression. However, as I have come to experience more and more, this comes at the price of increased ambiguity from the programmer's perspective which can lead to very subtle bugs. Another thing that strikes me ever so often is the standard library's inconsistency which can cause profound confusion.

I have collected a few of those cases that have bitten me and will summarize them here for posterity. There are almost certainly many more of those lurking; not exactly Principle of Least Surprise! Realizing this potential source of bugs really makes me appreciate a regular syntax even if the resulting code doesn't always read like natural language. And really, most Ruby code doesn't either. Which is no problem: it's program code after all! Okay, now let's move on to the exhibition.

Function Calls? Local Variables? I Just Work Here.

Let's say you have a method like this:

 def foo
   100
 end

Nothing special there. The problem now is that in Ruby you can (and usually do) leave off the parentheses enclosing the arguments on function call. So foo is equivalent to foo(). This obviously reduces line noise but alas, such function calls are indistinguishable from local variables. That's not a big deal usually because local variable names take precedence over method names when they are defined. However, there are edge cases. Consider this initialization of the local variable foo:

 foo = foo || 200

One might suspect that this will result in a local variable foo being bound to 100, i.e. the return value of the method foo. This is not the case though, the local variable foo is initialized with nil before the right hand side of the assignment is evaluated resulting in 200 being foo's eventual value. I ran into this specific case when dealing with Rails' FormBuilder which defines a method named object returning the object a FormBuilder instance deals with (if any). In my own method I naively used a local variable named object as there was no obviously better name for such a generic thing and initialized it in the way described above. After a lot of head scratching I ended up renaming it to obj or something.

Local Variables Strike Yet Again

A variation of that theme can be witnessed in this piece:

 def foo
   "the method foo"
 end

 case  true
 when false
   foo = "the variable foo"
 when true
   [foo, foo()]
 end

You may expect that this case statement returns ["the method foo", "the method foo"] but it actually returns [nil, "the method foo"]. This means that even though the false branch is never reached, the local variable foo is initialized with nil and thus always refers to the local variable foo inside the case statement unless you explicitly request a function call by giving the empty argument list. This is similar to what is sometimes referred to as variable hoisting in JavaScript and which is also a mostly unknown language pitfall. In fact the binding of foo will leak from the case statement into the code following it so foo will from thereon always refer to the local variable of the value nil.

This is also similar to how block arguments capture variables. Consider this code:

foo = 123

lambda do |foo|
  puts foo
end.call(999)

puts foo

You might expect that to print 999 and 123 but what really happens is that foo refers to the same variable both outside and inside the block. Thus 999 is printed twice as foo has been set to the block argument's value. This can lead to very tricky interactions when you have nested higher order function calls with one-off short block argument names. In Ruby 1.9 this behavior was changed to not affect the outer scope's variable and maybe behave more as one would expect. And of course it might break old code that perhaps unknowingly depends on this behavior in other subtle ways!

DSL Is for Dynamically Scoped Language

Although Ruby lacks full-blown syntactic extension mechanisms it provides a plethora of hooks to achieve comparable results through meta-programming and usage of proxy objecs. Thus Ruby is renowned for being well suited for building DSLs. Now, a common way to build a DSL in Ruby is passing an object to a block and calling methods on it. For example, Builder provides a DSL for creating hierarchical data structures like XML documents in such a way:

 Builder::XmlMarkup.new.album do |a|
   a.title "Dreaded Brown Recluse"
   a.artist "Howe Gelb"
 end
# => "<album><title>Dreaded Brown Recluse</title><artist>Howe Gelb</artist></album>"

Disregarding the somewhat doubtful practice of representing data as code which itself cannot be treated like data it's a fairly solid approach to creating DSLs. There is another way to do it which reads even nicer. For example, the Machinist library for creating fixture records uses this DSL for defining record blueprints (taken from the documentation):

Post.blueprint do
  author
  title  { "Post #{sn}" }
  body   { "Lorem ipsum..." }
end

How does that work if author, title, body, and sn are not defined in the current lexical scope, you might ask. What's happening behind the scenes here is that Machinist evaluates the block through instance_eval effectively turning the lexical closure into a dynamically scoped block without the caller knowing. What's worse is that it changes the implicit self reference to whatever object the block is instance_eval'ed in. This greatly reduces composability of such DSLs. Now this may not be a big deal for libraries like Machinist as they really are very specific to their domain and usually don't need to be used in different ways. But a very recent case of instanve_eval abuse is the new Rails 3 routing DSL. Back in 2008 Ola Bini blogged about instance_eval and effectively recommended to stay away from it unless you know what you are doing. As chance would have it he even mentions the Rails routing DSL as a positive example:

In almost all cases, if a block needs to handle method calls on a specific object, it should send that object in to the block as an argument. Take a look at routes in Rails - they could have been defined with instance_eval, but they’re not. There is no reason to use instance_eval for this case. Rails send in a route instead.

It looked something like this:

 ActionController::Routing::Routes.draw do |map|
   map.resources :posts, :collection => { :archived => :get }
 end

Three years later in Rails 3 the same routing definition looks like this:

 Foo::Application.routes.draw do
   resources :posts do
     collection do
       get :archived
     end
   end
 end

Again data is being represented as code but this time it has unfortunate consequences. With the old API you could easily pass on default arguments for resources by merging them into the options hash passed to it. Rails even has a convenience method for this very purpose called with_options that would allow for something like this:

 ActionController::Routing::Routes.draw do |map|
   map.with_options :member => { :confirm_delete => :get } do |m|
     m.resources :posts, :collection => { :archived => :get }
     m.resources :users
   end
 end

Now both resource routes would get an additional member route confirm_delete. It's not optimal but at least easy to compose. To achieve the same effect with the new DSL is more involved:

 Foo::Application.routes.draw do
   default_resource_routes = lambda do
     member do
       get :confirm_delete
     end
   end

   resources :posts do
     collection do
       get :archived
     end

     instance_eval(&default_resource_routes)
   end

   resources :users, &default_resource_routes
 end

There's just no elegant way to compose lambdas like that in Ruby (partly due to the one block argument special case) particularly in the presence of instance_eval. The instance_eval on our side wouldn't be necessary but we do it just in case because were default_resource_routes defined outside the routing defintion block it would not be able to call the routing helpers. Contrast this with the old API which is not only more concise but also much easier to compose by merely passing around data structures or wrapping the map proxy object.

So what's the point? Isn't this just bad library design? I'm not sure it's just that when even venerable frameworks like Rails are affected by such blunders. I think it's symptomatic of a language that itself contains many 80%-the-way constructs and provides the means for doing the same in your own code. A language's mindset transpires into your own, for better or for worse.

False Memoization

A common idiom for memoizing return values is using the ||= operator like this:

 def foo
   @foo ||= some_expensive_calculation
 end

This expands into:

 def foo
   @foo || @foo = some_expensive_calculation
 end

A pretty neat technique with very concise syntax. Alas, all is not well: if false or nil are also valid return values of some_expensive_calculation it won't work as expected, i.e. some_expensive_calculation will be executed on the next invocation as well. This is because ||= doesn't check for whether a variable is bound (what is probably actually intended by the memoization) but whether its value is nil or false which works in this case because instance variables (like @foo in the example) can be referenced without being bound to a value beforehand and evaluate to nil in such cases. Yet another 80% situation I found myself in more than once. There is another ||= edge case related to hashes with default values that was observed by David A. Black in 2008. This is mainly caused by the false intuition that

 @foo ||= bar

actually expands to

  @foo = @foo || bar

which, to many a surprise, is not the case.

The Library / The Library / It's A Place Where Books Are Free

Ruby's core library is pretty big and its standard library is even bigger (heck, up until 1.9 it even contained a SOAP implementation), not to speak of the vast amount of extensions available as Rubygems. If you want your code to work on many platforms and not depend on many external libraries in order to reduce the likeliness of breakage you usually want to use the libraries shipped with Ruby at least for basic things like file handling or networking. Unfortunately many of those libraries are of dubious quality. Some commenters even go so far as to call the standard library a ghetto.

Consider file handling. There is the File class that provides file IO and some pathname functions. But some operations are curiously missing. FileUtils provides more but also some of those already provided by File. And then there is Pathname which is kind of a superset of both File and FileUtils but exposes the same functionality in an object oriented way. This wouldn't be too terrible if it wasn't for those three implementations apparently not sharing too much code internally, leading to inconsistencies like this:

File.join('/foo/bar', '/baz')
# => "/foo/bar/baz"

Pathname.new('/foo/bar/').join('/baz').to_s
# => "/baz"

A similar state of affairs can be observed in the area of thread synchronization. There are (at least) Mutex, Mutex_m, ConditionVariable, Monitor, MonitorMixin, MonitorMixin::ConditionVariable, Sync, and Sync_m. It's nice to have options to choose from but again not much code is shared and each module probably comes with its own share of gotchas.

There Are Worse Things In This World

I know, and some might accuse me of nit-picking here. As is common knowledge these days, worse is better. But if you have the chance try to learn some programming languages that have a deeper notion of consistency and elegance. It will be a refreshing experience for your special-case burdened brain, I promise!