That’s definitely one way of doing it. It’s also the most Ruby idiomatic, that I myself could think of. The problem is, that with 100 million elements, this ends up creating 100 million temporary arrays (the #zip) and then summing each of them. On my MacBook Pro it takes about a minute to run.
Anybody else have a more optimized way of doing this?
As far as I can tell your version is different from zipping. Your results will have the length of the longer input, while the zipping variants will have the length of the shorter input.
edit
I just remembered zipWith from Haskell, and it seems as if Enumerable#zip can take a block, it works different from what I am used from haskell though…
In ruby the block passed to Enumerable#zip is executed for its sideeffects only, none of the inputs is changed, neither is an accumulator returned, so we have to build a result array in the block:
a = [1, 2, 3, 4, 5]
b = [5, 4, 3, 2, 1]
r = []
a.zip(b) do |x, y| c << x + y end
#=> [6, 6, 6, 6, 6]
At least in theory, this should not allocate more than 4 blocks at once, (2 for input, one for current result and one for next result when current outgrows capazity), though due to GC and stuff older chunks may survive as long as they do.
But GC does trick us in any of the aforementioned solutions to the problem.
True. It depends on what a person wants; there is no data lost here. In an additive sense that would seem to be more applicable. To match and change the behavior to the shorter list version you only need to switch the OR to an AND.
I don’t quite remember how Integer is implemented in Ruby. If the items being iterated over are truly integers I believe there is a full list of valid integers preset and all numbers are implemented with a pointer. This way any time you use numbers you’re not allocating any more memory than a pointer.
This is vaguely my memory of it and I could be wrong. But from what I understand pointers are always faster in C than other types so the language is designed to take advantage of it in this way.
If this is correct… the only space in memory being taken up is a list of pointers. It will take the same amount of memory as integers would but now is more performant. Shortening the list through pop will allow the garbage collector to free up the space allocated by that pointer. Although current Ruby implementations of the GC don’t free up pages of memory well and are fragmented so you don’t truly recover all the memory you should (you do for Ruby, but it’s not all freed for other applications to use).
I still like to develop in terms of limited memory available. It’s true that my way completely modifies the original source, but that is my intent. Running apps on servers in VMs can often require stricter memory allotments.
You might want to preload the two arrays in a setup/before section (initializing the memory for the initial arrays shouldn’t be part of this benchmark). And, more importantly, explicitly call GC.start before/after the benchmarks. Because the GC may hit periodically during overlapping tests (the data from a previous benchmark being GC’d in the next one).
I’ve changed the benchmark to have all the array initialization done before benchmarks and I dropped the length of the arrays down from 100 million to 10 million since my computer can’t handle it (I run out of RAM).
require "benchmark"
n = 10_000_000
puts "ADD"
puts "n = #{n}"
ARY1 = [2] * n
ARY2 = [3] * n
ARY1B = [2] * n
ARY3 = Array.new(n)
ARY3B = []
require "matrix"
M1 = Matrix[[2] * n]
M2 = Matrix[[3] * n]
M12D = Matrix[[[2] * n]]
M22D = Matrix[[[3] * n]]
require "numo/narray"
M1N1 = Numo::Int64.new(n).fill(2)
M2N1 = Numo::Int64.new(n).fill(3)
M1N2 = Numo::Int64.new(n, 1).fill(2)
M2N2 = Numo::Int64.new(n, 1).fill(3)
Benchmark.bmbm(20) do |measure|
measure.report("zip") do
ARY1.zip(ARY2).map { |i| i.reduce(:+) }[0..10].to_s
end
measure.report("zip_lazy") do
ARY1.lazy.zip(ARY2).map { |i| i.reduce(:+) }.to_a[0..10].to_s
end
measure.report("zip_sum") do
ARY1.zip(ARY2).map(&:sum)[0..10].to_s
end
measure.report("zip2") do
ARY1.zip(ARY2).map { |a, b| a + b }[0..10].to_s
end
measure.report("idx") do
ARY1.each_with_index do |elem, idx|
ARY3[idx] = elem + ARY2[idx]
end
ARY3[0..10].to_s
end
measure.report("idx<<") do
ARY1.each_with_index do |elem, idx|
ARY3B << elem + ARY2[idx]
end
ARY3B[0..10].to_s
end
measure.report("map!") do
ARY1B.map!.with_index do |elem, idx|
elem + ARY2[idx]
end
ARY1B[0..10].to_s
end
measure.report("matrix") do
(M1 + M2).to_a[0..10].to_s
end
measure.report("matrix 2d") do
(M12D + M22D).to_a[0][0..10].to_s
end
measure.report("numo/narray") do
(M1N1 + M2N1).to_a[0..10].to_s
end
measure.report("numo/narray 2d") do
(M1N2 + M2N2).to_a[0..10].to_s
end
end
Now I’ve found that when doing the same work in Rust I don’t have any issues with memory at 100 million integers per array.
In Rust doing 3 different kinds of tests 10 million times I get the following times.
running 3 tests
test add_push ... bench: (0.072995736) sec/iter (+/- 0.004712853)
test add_replace ... bench: (0.036814299) sec/iter (+/- 0.000794285)
test add_zip ... bench: (0.063002991) sec/iter (+/- 0.002482924)
test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out
I’ve translated the time from nanoseconds to seconds. I’ve also pre-initialized the arrays of data outside the benchmark just as I did in the Ruby code above. I had to cheat on replace since Rust’s current benchmark system has a hard 300 times of repetition and no begin/setup feature like Minitest… I wanted to test the time cost of only the action so the resulting data is off by about that much but that is irrelevant.
I was testing including the array initialization because I am interested in the overall user performance experience, but I will make benchmarks with only the inner operations as well. Thanks for your input @danielpclark.
Bohrium isn’t really meant for speed, but rather ease of use, as it is (hopefully) faster or just as fast as an array programming application from Fortran, but easier to use (because of Ruby). It also includes a Python front-end that is aimed towards scientific personnel.