tablib icon indicating copy to clipboard operation
tablib copied to clipboard

Rewrite ODS support based on loxun XMLWriter module

Open bdauvergne opened this issue 9 years ago • 6 comments

It uses constant memory and is a lot faster than odf and odf3 packages as the document is not built in memory prior to serialization. OpenDocument is a simple format that should not need many thousand lines of code and gigabytes of memory to export a simple table of tens of thousand of lines.

A temporary file is needed as zipfile does not support streaming directly into it, if it's a problem I can do it in memory with BytesIO augmenting a little bit the memory consumption.

With the current implementation it's nearly impossible to export a 100 000 lines table to ODS in a constrained memory environment (VM with 1 Gb of memory).

bdauvergne avatar Jul 02 '16 08:07 bdauvergne

@bdauvergne , just out of curiosity, could I find the ods writer lib(Copyright (C) 2005-2016 Entr'ouvert) on pypi or github?

chfw avatar Aug 03 '18 22:08 chfw

This code is new, I produced it on my employer (Entr'ouvert) time, it's freely inspired by this package (http://git.entrouvert.org/wcs.git/tree/wcs/qommon/ods.py) also from Entr'ouvert which use ElementTree and so do not have bounded memory consumption for this you need a streaming XmlWriter like API.

bdauvergne avatar Aug 04 '18 07:08 bdauvergne

Thanks for your reply.

I planned to copy your code to produce a specialised ods writer for pyexcel, as pyexcel-odsw. As you mentioned in this PR, odfpy and ezodf does not use constant memory in writing an ods. I hope you will be OK with my copying.

For your information, messy-tables had a better performing ods reader and it inspired pyexcel-odsr. So your code is the missing puzzle to complete ods story: performant writer + performant reader.

chfw avatar Aug 04 '18 08:08 chfw

No problem, just keep the copyright.

bdauvergne avatar Aug 04 '18 09:08 bdauvergne

@bdauvergne Why not add loxun in the requirements.txt as it is available on pypi at https://pypi.org/project/loxun/ instead of copy pasting the whole file in the tablib project?

frallain avatar Aug 29 '18 11:08 frallain

Just thought it was the tablib way, it contains (contained?) so much external dependencies, I did not know they were all not packaged on pypi.

bdauvergne avatar Sep 05 '18 09:09 bdauvergne