dbf icon indicating copy to clipboard operation
dbf copied to clipboard

CLI performance

Open SheetJSDev opened this issue 2 years ago • 2 comments

Zipped test file 4MB_35KR.dbf.zip

We were surprised to find that the CLI tool that ships with the gem is significantly slower than JS equivalents:

DBF_4MB

Is there some more efficient way to generate a CSV from a DBF file?

SheetJSDev avatar Jul 16 '22 02:07 SheetJSDev

I don't find this surprising since JS is much faster than Ruby in general. You may get a bit better performance if you load the whole file into memory to reduce file IO.

table = DBF::Table.new(StringIO.new File.read('4MB_35KR.dbf'))
table.to_csv

infused avatar Jul 18 '22 19:07 infused

That helped a bit. Interestingly, the CSV formatting proved to be a bigger performance hit, with 20% improvement locally by switching from table.to_csv to:

    table.each do |record|
      puts record.attributes
    end

That explains enough to be able to close the issue.

.

One design question: to_csv works as follows:

      each { |record| csv << record.to_a }

Reordering the code in record.rb:

    def attribute_map # :nodoc:
      @columns.map { |column| [column.name, init_attribute(column)] }
    end

    def attributes
      @attributes ||= Hash[attribute_map]
    end

    def to_a
      @columns.map { |column| attributes[column.name] }
    end

A hash is constructed from the K/V pairs. The hash is stored but the K/V pairs are not. Would it be more efficient/performant in general to store the values array (in column order) and zip the columns and values when the hash is needed?

SheetJSDev avatar Aug 13 '22 08:08 SheetJSDev

Good advice, thank you. This optimization does result in a performance improvement of approximately 10% in my initial testing.

infused avatar Aug 15 '22 23:08 infused

Released in version 4.2.2

infused avatar Aug 15 '22 23:08 infused