fix(odt): fix disk-space leak in partition_odt()
Remedy disk-space leak where partition_odt() would leave an on-disk copy of each .odt file passed as a file-like object.
partition_odt() creates a temporary file in which it writes each source-document provided as a file-like object. This file is not deleted and disk consumption grows without bound.
The convert_and_partition_docx() function used to convert ODT->DOCX uses pandoc (a command-line program) to do the conversion. Because this command-line program operates in a different memory space, the source file cannot be passed as an in-memory object and needs to be on the filesystem. When the ODT source-document is passed as a file-like object, it is written to disk so the conversion program has access to it. It is not deleted afterward.
Fix this by writing the temporary source ODT file in a TemporaryDirectory and also use that location to write the conversion-target DOCX file. That directory is automatically removed when partition_odt() completes.
While we're in there, improve the factoring of partition_odt().
- Extract
convert_and_partition_docx()frompartition.docx(used only bypartition_odt()) to_convert_odt_to_docx()inpartition.odtwhere it is used. Decouple file conversion from callingpartition_docx()with the converted file as thepartition_docx()call ispartition_odt()'s natural responsibility. - Improve docstrings, typing, and comments.
- All tests pass both before and after.