af83

Ruby 2.0 : Enumerator::Lazy

With the release of a first preview during RubyConf 2012, the release of Ruby 2.0 is getting closer.

After Module#prepend, Module#refine, let's talk about Enumerator::Lazy.

Chainable iteration

Given this nonsensical example:

hashes = (1..10000).select(&:even?).map(&:hash).map(&:to_s)

This code does the following:

  • keeps only even numbers
  • fetches their internal hash
  • transforms those hash in String

But it also creates a intermediate array for each block, and iterates several times.

That can be done in one iteration:

hashes = (1..10000).inject([]) do |accumulator, number|
  if number.even?
    accumulator << number.hash.to_s
  else
    accumulator
  end
end

Let's face the truth: the code is less readable.

Enumerator::Lazy

Here comes the laziness.

hashes = (1..10000).lazy.select(&:even?).map(&:hash).map(&:to_s).to_a

When to_a is called, the code is evaluated. Internally, Ruby builds a specific block: no intermediate arrays are created and only one iteration occurs.

Without calling to_a (which is an alias to the force method), it returns a Enumerator#Lazy object.

#<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy:
1..10000>:select>:map>:map>

Benchmarking

Since there is only one iteration, it should be faster. Well, let's run some benchmarks first.

require 'benchmark'

[10000, 100000, 1000000, 10000000].each do |size|
  Benchmark.bm do |b|
    b.report("chainable #{size}") do
      hashes = (1..size).select(&:even?).map(&:hash).map(&:to_s)
    end
  end

  Benchmark.bm do |b|
    b.report("one iteration #{size}") do
      hashes = (1..size).inject([]) do |accumulator, number|
        if number.even?
          accumulator << number.hash.to_s
        else
          accumulator
        end
      end
    end
  end

  Benchmark.bm do |b|
    b.report("chainable lazy #{size}") do
      hashes = (1..size).lazy.select(&:even?).map(&:hash).map(&:to_s).to_a
    end
  end
end

So, Enumerator#Lazy seems to be the slowest in every case. According to a bug report, the cost of block creation for laziness is bigger than its gain. That being said, those benchmarks need to be rerun when Ruby 2.0 will be released.

As pointed out in this bug report, there are some cases where Enumerator#Lazy is definitely the best choice you can make. For example, you can use it when extracting elements from a huge, or even infinite enumerator.

Prime.select {|x| x % 4 == 3 }.take(10)

This code iterates on all prime numbers before doing a select, then a take. However, it will iterates indefinitely and will never do the select nor the take.

a = []
Prime.each do |x|
  next if x % 4 != 3
  a << x
  break if a.size == 10
end

This code would do the job, but is not easily readable.

With lazy, we can do:

Prime.lazy.select {|x| x % 4 == 3 }.take(10).to_a

This would return immediately, and provide a good readability.

Coming next in this series: Named arguments.

blog comments powered by Disqus