anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Should we be deleting partially written files if writing fails?

Open ivirshup opened this issue 4 years ago • 4 comments

Raised by: https://github.com/theislab/anndata/issues/498#issuecomment-770156015

Currently, when an error occurs while writing an AnnData object to disk, we just bail. Should we be cleaning up after ourselves better?

ivirshup avatar Jan 30 '21 07:01 ivirshup

Can anything be recovered from a partially written file (not necessarily via scanpy)? If yes, maybe we shouldn't delete considering a scenario where the user might want to recover some bits from it.

gokceneraslan avatar Jan 30 '21 15:01 gokceneraslan

We could make it an option:

By default clean up, and if the user would rather have a partially written file because of a deadline, let them have the option.

flying-sheep avatar Jan 31 '21 11:01 flying-sheep

Can anything be recovered from a partially written file

Probably depends on how writing failed, but I'd expect so in most cases. The issue here is that it could have failed while writing any element inside the object, so I don't want to make promises about what can be recovered if writing fails.

would rather have a partially written file because of a deadline

I'm not sure this is a compelling use case to me. Especially since we don't make promises about what order things are written in, I wouldn't want anyone to rely on this. We can promise the original object is not modified, and we try to tell people which element caused the issue. I think a better "quick fix" here is to delete that element and write again.

ivirshup avatar Feb 01 '21 03:02 ivirshup

The issue here is that it could have failed while writing any element inside the object, so I don't want to make promises about what can be recovered if writing fails.

I assume hdf5 doesn’t have transactions or so …

I think a better "quick fix" here is to delete that element and write again.

Yeah. Maybe we should also offer the old trick “write to a tempdir and move to/replace the target file on success”. That should be useful for sufficiently small data, as it makes writes atomic and unable to destroy files.

flying-sheep avatar Feb 01 '21 09:02 flying-sheep