SnappyOutputStream fails when there is no disk space available
We use SnappyHadoopCompatibleOutputStream paired with Akka streams to ingest data, and we run into issues when attempting to flush the stream when there isn't enough disk space. Akka streams' implementation of write essentially boils down to this (excuse the scala code):
val os = ... // Some implementation of OutputStream
try {
os.write(("a"*100000000).getBytes)
} catch {
case e: Exception =>
println("got exception:" + e)
os.close()
}
If we put val os = new FileOutputStream("/path/to/directory/where/no/space") it will catch the exception and exit gracefully. However, if we substitute val os = new SnappyHadoopCompatibleOutputStream("/path/to/directory/where/no/space") it will throw another exception while trying to close the output stream, because in SnappyOutputStream, the close() function also calls the flush() function, and there is no catch in the end. I'm proposing we add a catch function to the end of close() function to be able to handle that so the exception doesn't bubble up out of our control.
Note that this happens with all subclasses of SnappyOutputStream, which includes SnappyFramedOutputStream.
Consuming IOException at flush call inside close method might not be a good idea because SnappyOutputStream is currently designed to guarantee output the entire compressed data set before closing output resources.
For example,
-
os.write(...)succeeds (because of internal buffering), butos.close()might fail because of disk full. If we consume the exception inside close(), users will fail to notice compressed data corruption.
So at least, in your example, we need to:
- call
os.flush()inside try-catch call. - If write methods (e.g., os.write(...), os.flush()) throws an exception, we need set the state of the SnappyOutputStream as failed, and avoid calling
flushinsideclose()method.
So I guess wrapping with try-finally around flush() is ok to ensure calling the subsequent internal out.close(), but we still need to throw an exception.
So anyway it will not work similarly with FileOutputStream.
Sorry for the delay in response, had to drop this for other issues. You're right in that FileOutputStream will "write until full" and then exit gracefully. Even though the IOException is caught, the output stream would write incomplete/corrupt data. The only difference between FOS and SnappyOutputStream is that SnappyOutputStream will bubble up as opposed to FileOutputStream, which will not notify.
I guess what we could do is one of two things:
- Wrap that code block I provided in the example in another try-catch on a higher level, so we get notification that the output stream failed at writing to file, and run a custom recovery process (remove partially written file, and pause on writing until some watermark).
- Track state of flush/close within SnappyOutputStream, as you mentioned.
Option 2 seems like the better design long-term, but I would imagine that the output stream needs to provide some sort of state function to handle that (onWriteFailed or something), which may need some careful thinking. I agree with you that the better solution for now is to just implement option 1.
Regardless, thanks for your help! Feel free to close this issue if you see fit.