BXA icon indicating copy to clipboard operation
BXA copied to clipboard

mpi4py returns a BlockingIOError/OSError: Unable to create file

Open jpbreuer opened this issue 2 years ago • 2 comments

  • BXA version: 4.0.5
  • UltraNest version: 3.4.4
  • Python version: 3.9
  • Xspec or Sherpa and version: Xspec 12.11.1
  • Operating System: Debian GNU/Linux 11 (bullseye)

Description

While attempting to parallelize BXA with mpi, h5py file is created but locked. After following recommendation in previous (closed) bxa issue thread here, and attempting to reinstall all dependencies, problem persists, but with new error.

I read many forums regarding the errors, and they have recommended reinstalling dependencies, it seems as though the h5py file is corrupted while being created.

What I Did

Old error:

Traceback (most recent call last):
  File "/home/jpbreuer/Scripts/bxa_test.py", line 373, in <module>
    results = solver.run(resume=True)
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/bxa/xspec/solver.py", line 188, in run
    self.results = solve(
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/ultranest/solvecompat.py", line 55, in pymultinest_solve_compat
    sampler = ReactiveNestedSampler(
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/ultranest/integrator.py", line 1077, in __init__
    self.pointstore = HDF5PointStore(storage_filename, storage_num_cols, mode='a' if resume else 'w')
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/ultranest/store.py", line 187, in __init__
    self.fileobj = h5py.File(filepath, **h5_file_args)
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/h5py/_hl/files.py", line 507, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/h5py/_hl/files.py", line 232, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
BlockingIOError: [Errno 11] Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

Updated error:

Traceback (most recent call last):
  File "/home/jpbreuer/Scripts/bxa_test.py", line 128, in <module>
    results = solver.run(resume=True)
  File "/usr/local/lib/python3.9/dist-packages/bxa/xspec/solver.py", line 188, in run
    self.results = solve(
  File "/usr/local/lib/python3.9/dist-packages/ultranest/solvecompat.py", line 55, in pymultinest_solve_compat
    sampler = ReactiveNestedSampler(
  File "/usr/local/lib/python3.9/dist-packages/ultranest/integrator.py", line 1077, in __init__
    self.pointstore = HDF5PointStore(storage_filename, storage_num_cols, mode='a' if resume else 'w')
  File "/usr/local/lib/python3.9/dist-packages/ultranest/store.py", line 187, in __init__
    self.fileobj = h5py.File(filepath, **h5_file_args)
  File "/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/files.py", line 387, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/files.py", line 187, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5py/_debian_h5py_serial/_objects.pyx", line 54, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/_objects.pyx", line 55, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/h5f.pyx", line 108, in h5py._debian_h5py_serial.h5f.create
OSError: Unable to create file (unable to open file: name = 'bxatest/results/points.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/files.py", line 185, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_debian_h5py_serial/_objects.pyx", line 54, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/_objects.pyx", line 55, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/h5f.pyx", line 88, in h5py._debian_h5py_serial.h5f.open
OSError: Unable to open file (truncated file: eof = 96, sblock->base_addr = 0, stored_eof = 2048)

jpbreuer avatar May 02 '22 20:05 jpbreuer

Double-check that you can import mpi4py in your python/sherpa script.

https://johannesbuchner.github.io/UltraNest/debugging.html#Parallelisation-issues

JohannesBuchner avatar Nov 09 '22 19:11 JohannesBuchner

and delete bxatest/results/points.hdf5

JohannesBuchner avatar Nov 09 '22 19:11 JohannesBuchner