Should we be deleting partially written files if writing fails?
Raised by: https://github.com/theislab/anndata/issues/498#issuecomment-770156015
Currently, when an error occurs while writing an AnnData object to disk, we just bail. Should we be cleaning up after ourselves better?
Can anything be recovered from a partially written file (not necessarily via scanpy)? If yes, maybe we shouldn't delete considering a scenario where the user might want to recover some bits from it.
We could make it an option:
By default clean up, and if the user would rather have a partially written file because of a deadline, let them have the option.
Can anything be recovered from a partially written file
Probably depends on how writing failed, but I'd expect so in most cases. The issue here is that it could have failed while writing any element inside the object, so I don't want to make promises about what can be recovered if writing fails.
would rather have a partially written file because of a deadline
I'm not sure this is a compelling use case to me. Especially since we don't make promises about what order things are written in, I wouldn't want anyone to rely on this. We can promise the original object is not modified, and we try to tell people which element caused the issue. I think a better "quick fix" here is to delete that element and write again.
The issue here is that it could have failed while writing any element inside the object, so I don't want to make promises about what can be recovered if writing fails.
I assume hdf5 doesn’t have transactions or so …
I think a better "quick fix" here is to delete that element and write again.
Yeah. Maybe we should also offer the old trick “write to a tempdir and move to/replace the target file on success”. That should be useful for sufficiently small data, as it makes writes atomic and unable to destroy files.