parquet-go
parquet-go copied to clipboard
Handle io.EOF errors returned by ReadAt opening files
Hey,
we're opening parquet files with an object storage client and not directly via os.File. This currently fails.
The underlying ReaderAt interface is allowed to return io.EOF. Currently, opening a file fails for us because reading the footer the object storage client reads until the end of the file returning io.EOF yet at the same time the data was written to the buffer just fine.
Now we can fix this in our FrostDB code, however, we think that generally speaking FrostDB' object storage client is compliant with the ReaderAt interface.
What is this project's stance on handling the io.EOF here?
This makes sense to me for reading the footer where you would likely get valid data and an EOF, but do we need this for the column and offset indexes?
Maybe we can use a helper function to handle unexpected conditions?
func readFullAt(r io.ReaderAt, b []byte, off int64) (int, error) {
n, err := r.ReadAt(b, off)
if n == len(b) {
err = nil
} else {
switch err {
case nil:
err = io.ErrNoProgress
case io.EOF:
err = io.ErrUnexpectedEOF
}
}
return n, err
}
- we don't make assumptions about the underlying io.ReaderAt implementation
- we silence errors if we read all the bytes we were looking for as they don't impact correctness
I'd be happy with such a helper function.
Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this PR there if it is still relevant.