gopar
gopar copied to clipboard
Memory usage very high
Hello!
First off, I'm so glad this library exists. I've been looking around for a pure-go par2 implementation for ages and recently came across this project. Nice work!
In my testing locally, it appears as though this library will load all file data into memory for processing. While that may be fine for a smaller dataset, my use case is in the 10s of gigabytes and obviously won't work.
I know this is likely still under active development, but would you consider having the API be stream-based? In a perfect world, I'm imagining everything being based on io.Reader
and io.Writer
interfaces. That way things can be processed in chunks, and a nice advantage is the source and destination streams aren't limited to being on-disk.
I actually would love to be able to hook this up to a virtual filesystem via the new io/fs
package coming in go 1.16.
I'll try to take a stab at this, but I won't have much time to work on it until March or later. I mostly wanted to get this open to see if it was already planned work or not?
Thanks again!
Yeap, it's a known problem, I just implemented the most basic thing that could have worked! 😅
Definitely would like everything to be stream-based -- io/fs
might be a good fit, too. There might be some refactoring needed as some parts might do multiple passes over the data (but I'm not sure, I'd have to check). But yeah, this would be a pretty nice win.
Ah ok no worries :)
Based on what I could see, it would require a bit of refactoring to be stream based. But not impossible by any means.