ocaml-ooxml
ocaml-ooxml copied to clipboard
Pure Zip
The README says the following under a TODO:
Make a pure-OCaml ZIP library. Then we won't need any system dependencies and should work with js_of_ocaml too.
I've been quite happy with this library for the decompression, but I haven't yet found something that handles accessing files inside ZIP files yet: https://github.com/mirage/decompress
Maybe @dinosaure has some more pointers.
I think the zip format isn't particularly complicated, so it shouldn't be too bad to implement: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
The hard part is probably figuring out how to structure the module so you:
- Don't have to hold the entire file in memory
- Don't have to read the entire file if you only one one item in it
- Can wrap the functions with Async/Lwt
@brendanlong decompress can do streamed reads, and ZIP files afaik are continuous, so streaming read from a file should not be an issue. You could have a "paging" callback to feed you with more data from specific ranges in the file when needed to support arbitrary lookups.
From what I read, deflate block of a Zip format is close to RFC1951 - but not explicitly mention this standard - (and by this way, close to decompress). From what I already know, GZIP is a layer on top of RFC1951 too. So, decompress
can be used to handle deflate block in this case (I believe).
We already have a plan to handle GZIP firstly and ZIP then in decompress
.
As @cfcs said, decompress
provides a non-blocking API which can be easily wrap with Lwt/Async. About memory, again decompress
works on 2 bounded (allocated by the user) buffers and a window of 32Kb. So, may be it could be an interesting solution in your case.
I don't have a time to focus on this problematic yet but I can put it in my TODO 👍 !