dbf
dbf copied to clipboard
CLI performance
Zipped test file 4MB_35KR.dbf.zip
We were surprised to find that the CLI tool that ships with the gem is significantly slower than JS equivalents:
![DBF_4MB](https://user-images.githubusercontent.com/6070939/179334780-abcde626-abc1-4ba8-b37a-d6ce04db0c57.png)
Is there some more efficient way to generate a CSV from a DBF file?
I don't find this surprising since JS is much faster than Ruby in general. You may get a bit better performance if you load the whole file into memory to reduce file IO.
table = DBF::Table.new(StringIO.new File.read('4MB_35KR.dbf'))
table.to_csv
That helped a bit. Interestingly, the CSV formatting proved to be a bigger performance hit, with 20% improvement locally by switching from table.to_csv
to:
table.each do |record|
puts record.attributes
end
That explains enough to be able to close the issue.
.
One design question: to_csv
works as follows:
each { |record| csv << record.to_a }
Reordering the code in record.rb
:
def attribute_map # :nodoc:
@columns.map { |column| [column.name, init_attribute(column)] }
end
def attributes
@attributes ||= Hash[attribute_map]
end
def to_a
@columns.map { |column| attributes[column.name] }
end
A hash is constructed from the K/V pairs. The hash is stored but the K/V pairs are not. Would it be more efficient/performant in general to store the values array (in column order) and zip the columns and values when the hash is needed?
Good advice, thank you. This optimization does result in a performance improvement of approximately 10% in my initial testing.
Released in version 4.2.2