MDANSE icon indicating copy to clipboard operation
MDANSE copied to clipboard

[BUG] Kill MDANSE job when 'No space left on device' errors occur.

Open ChiCheng45 opened this issue 1 year ago • 1 comments

Description of the error When running an MDANSE job with no disk space left, MDANSE jobs continue to run. Errors are seen in the console.

Suggested fix When the user runs out of disk space the MDANSE jobs should fail immediately with a disk space error.

ChiCheng45 avatar Nov 14 '24 16:11 ChiCheng45

Looks like this is a particularly strange behaviour with h5py. It doesn't seem to raise an exception when it fails to write to disk when there is no longer any space. This happens when writing to a dataset that has the chunks property set. When I remove the chunks setting from this part of the code

https://github.com/ISISNeutronMuon/MDANSE/blob/26487f13c2ca0c1308041081e670067d943c1fbe/MDANSE/Src/MDANSE/MolecularDynamics/Trajectory.py#L749-L820

which is used for trajectory conversion, and if I run trajectory conversion on a disk with insufficient space, then h5py will raise an error and cause the job to fail.

This is kinda strange, I tried to create a minimal reproducible example, but h5py seems to raise the OSError when the chunking is or is not set. The error seems to be very specific to something going on in MDANSE.

ChiCheng45 avatar May 09 '25 15:05 ChiCheng45