elixlsx icon indicating copy to clipboard operation
elixlsx copied to clipboard

What kind of performance is to be expected?

Open zachdaniel opened this issue 8 years ago • 7 comments

Right now we have a 2.5mb spreadsheet being generated, and the call to write_to_memory/2 takes about 25 seconds. Is this more than normal, or should we be investigating a cause?

Regardless it would be a very high priority for us to make this faster.

zachdaniel avatar Mar 06 '17 06:03 zachdaniel

I checked, and its not the call to :zip.create/3 causing the slowness its the call to Elixlsx.Writer.create_files/2

zachdaniel avatar Mar 06 '17 14:03 zachdaniel

Thanks for doing the benchmarking! I have not written elixlsx with performance in mind, but I agree that 25secs for ~2.5MB seems a bit slow.

Could you share any specifics on what makes the spreadsheet so large? Is it a lot of sheets, large sheets (and in which dimension)?

Could you share a code snippet that reproduces this behaviour?

Thanks!

xou avatar Mar 13 '17 10:03 xou

Elixlsx currently relies on a lot of string interpolation and string concatenation. I believe that replacing this with iolists will increase performance significantly. I updated the make_sheet/2 to use iolists and it ran 16% faster (as measured with Benchee), even though all the other functions that feed into make_sheet/2 were unchanged. The speed should increase with each function that's changed. The update was simply to output a list of the various strings, instead of concatenating them.

This is a good blog post I found on the topic: https://www.bignerdranch.com/blog/elixir-and-io-lists-part-1-building-output-efficiently/

ryanhart2 avatar Feb 24 '19 01:02 ryanhart2

@ryanhart2 any chances to get a PR with this improvement?

andreapavoni avatar Mar 04 '19 12:03 andreapavoni

I am having an issue with a spreadsheet that is only 8MB total, but consumes a whooping 2GB RAM while writing it.

If the IO List is a solution, I would be more than happy to finance a freelancer to do so.

@zachdaniel @ryanhart2

roehst avatar Mar 12 '24 17:03 roehst

I am investigating the cost (in # of reductions) for very simple spreadsheets with N rows.

The number of reductions/row seems to go up with the number of rows - which looks like quadratic to me. This makes large spreadsheets take much longer.

Is this relatable to the use of IO lists?

My benchmark is naive, so I welcome feedback.

defmodule Example do
  alias Elixlsx.Workbook
  alias Elixlsx.Sheet

  def generate_data(n) do
    for _ <- 1..n do
      %{
        user_id: 200,
        shares: 200,
        strike_price: 200
      }
    end
  end

  def run(n) do
    task =
      Task.async(fn ->
        data = generate_data(n)

        reductions_1 = :erlang.process_info(self)[:reductions]

        workbook =
          %Workbook{}
          |> append_data_sheet(data)

        _ = Elixlsx.write_to_memory(workbook, "foo.xlsx")

        reductions = :erlang.process_info(self)[:reductions] -reductions_1

        reductions
      end)

    reductions = Task.await(task)

    # reds / n
    IO.puts("Reductions per row: #{reductions / n}")
  end

  def append_data_sheet(workbook, data) do
    sheet = %Sheet{name: "Data"}

    sheet =
      Enum.reduce(Enum.with_index(data), sheet, fn {row, index}, sheet ->
        sheet =
          sheet
          |> Sheet.set_at(index + 1, 1, row.user_id)
          |> Sheet.set_at(index + 1, 2, row.shares)
          |> Sheet.set_at(index + 1, 3, row.strike_price)

        Enum.reduce(1..10, sheet, fn j, sheet ->
          Sheet.set_at(sheet, index + 1, 3 + j, 5)
        end)
      end)

    Workbook.append_sheet(workbook, sheet)
  end
end

roehst avatar Mar 12 '24 18:03 roehst

Did anyone compared Elixlsx with Exceed? https://github.com/synchronal/exceed

user20230119 avatar May 20 '24 09:05 user20230119