docjure icon indicating copy to clipboard operation
docjure copied to clipboard

Add support for streaming API.

Open bagl opened this issue 8 years ago • 10 comments

Issue #46.

One function for dumping data into XLSX file using streaming and one macro for doing the same, but explicitly by row using append-row! function (define in the scope of the macro body).

I do not know if append-row! scoped to macro body is a good idea...

bagl avatar Apr 18 '16 21:04 bagl

Alternative withou using macro (just pass a function in that takes function to append row as its argument)

(defn with-streaming-workbook-fn!
  [file-name sheet-name body-fn]
  (let [wb (SXSSFWorkbook. 1)
        ^Sheet sh (add-sheet! wb sheet-name)
        row-num (atom 0)
        append-row!
        (fn [row-data]
          (let [r (.createRow sh @row-num)]
            (swap! row-num inc)
            (doseq [[j value] (map-indexed #(list %1 %2) row-data)]
              (set-cell! (.createCell r j) value))))]
    (try
      (body-fn append-row!)
      (save-workbook-into-file! file-name wb)
      (finally
        (.dispose wb)))))

There are no name collisions then...

bagl avatar Apr 19 '16 04:04 bagl

Thanks for contributing. It solves the immediate problem for writing data.

However, it does not play well with the sheet styling code. I think the styling code is at fault as it is imperative in nature and not so elegant.

I wonder if you have thought about this and we could find a design that would enable the library to work (to the extent possible) with both in-memory and streaming spreadsheets.

There is an old idea for a V2 design with a data structure that can be piped (thrushed) between the Docjure functions instead of passing the underlying POI objects. This would enable us to use it as a builder (lazily, even) which would be good for streaming and also allow to make the styling and formatting code more elegant as it could have a notion of context ("make the last row bold", "add another sheet at the end").

It would probably involve multi-methods so that the functions can take advantage of the context. For example we can track the selected sheet so that rows added will be added to the last one selected.

With multiple sheets in play it might even be able to jump between them in data processing code as we would build up the structures lazily and still be able to write them sequentially if needed the streaming API.

Please share your thoughts.

mjul avatar Jun 08 '16 09:06 mjul

Thanks for this PR, I just tested it by placing the new functions in an isolated namespace and got my fairly small workbooks generated using vanilla lein (-Xmx1g). Workbooks in question are 18MB (~53000 rows, many columns) and 5.9MB (47000 rows, still many columns).

kennethkalmer avatar Mar 06 '17 14:03 kennethkalmer

@mjul Is there something preventing this merge? Metabase is depending on docjure to export their tables to excel. And this functionality is currently experiencing serieus memory issues (https://github.com/metabase/metabase/issues/5187).

ghost avatar Aug 30 '17 12:08 ghost

I have been on vacation and busy with other things. I will have a look ASAP. Cheers, Martin

mjul avatar Aug 30 '17 14:08 mjul

OK, I looked at it again. My main concern is that it is not really integrated with the other library functions (it allows you to write a single-sheet workbook). However, If you add some test cases to show that it supports the regular cell data types I will add it as is to the version 1 branch, then we can work out a better API for V2 that allows it to play better with the rest of the library.

mjul avatar Sep 14 '17 17:09 mjul

@bagl and @milanvdmria, is there any way to move forward, as this issue still blocks Metabase download feature ? Please help, thank you.

thucnc avatar Sep 25 '17 15:09 thucnc

@thucnc To be honest, we do not have any in-company knowledge on Clojure and do not have the bandwidth to work on this currently.

ghost avatar Sep 25 '17 15:09 ghost