pymc-bart icon indicating copy to clipboard operation
pymc-bart copied to clipboard

Broken pipe failures during sampling on MacOS

Open fonnesbeck opened this issue 1 year ago • 2 comments

Describe the bug

When sampling BART models on MacOS, I frequently (but not always) get broken pipe errors, presumably due to multiprocessing, towards the end of sampling runs.

PMB version: 0.5.7 PyMC version: 5.10.3 Python version: 3.10

Additional context

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 122, in run
    self._start_loop()
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 174, in _start_loop
    point, stats = self._step_method.step(self._point)
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/compound.py", line 231, in step
    point, sts = method.step(point)
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 100, in step
    apoint, stats = self.astep(q)
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/site-packages/pymc_bart/pgbart.py", line 293, in astep
    self.bart.all_trees.append(self.all_trees)
  File "<string>", line 2, in append
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/multiprocessing/managers.py", line 817, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(buf)
  File "/Users/cfonnesbeck/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
"""

The above exception was the direct cause of the following exception:

BrokenPipeError                           Traceback (most recent call last)
File [~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:122](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:122), in run()
    [121](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:121)     self._point = self._make_numpy_refs()
--> [122](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:122)     self._start_loop()
    [123](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:123) except KeyboardInterrupt:

File [~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:174](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:174), in _start_loop()
    [173](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:173) try:
--> [174](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:174)     point, stats = self._step_method.step(self._point)
    [175](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/sampling/parallel.py:175) except SamplingError as e:

File [~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/compound.py:231](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/compound.py:231), in step()
    [230](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/compound.py:230) for method in self.methods:
--> [231](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/compound.py:231)     point, sts = method.step(point)
    [232](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/compound.py:232)     stats.extend(sts)

File [~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py:100](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py:100), in step()
     [98](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py:98) q = DictToArrayBijection.map(var_dict)
--> [100](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py:100) apoint, stats = self.astep(q)
    [102](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py:102) if not isinstance(apoint, RaveledVars):
    [103](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc/step_methods/arraystep.py:103)     # We assume that the mapping has stayed the same

File [~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc_bart/pgbart.py:293](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc_bart/pgbart.py:293), in astep()
    [292](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc_bart/pgbart.py:292) if not self.tune:
--> [293](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc_bart/pgbart.py:293)     self.bart.all_trees.append(self.all_trees)
    [295](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/site-packages/pymc_bart/pgbart.py:295) stats = {"variable_inclusion": variable_inclusion, "tune": self.tune}

File <string>:2, in append()

File [~/mambaforge/envs/pie/lib/python3.10/multiprocessing/managers.py:817](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/managers.py:817), in _callmethod()
    [815](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/managers.py:815)     conn = self._tls.connection
--> [817](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/managers.py:817) conn.send((self._id, methodname, args, kwds))
    [818](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/managers.py:818) kind, result = conn.recv()

File [~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:211](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:211), in send()
    [210](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:210) self._check_writable()
--> [211](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:211) self._send_bytes(_ForkingPickler.dumps(obj))

File [~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:410](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:410), in _send_bytes()
    [409](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:409)     self._send(header)
--> [410](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:410)     self._send(buf)
    [411](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:411) else:
    [412](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:412)     # Issue #20540: concatenate before sending, to avoid delays due
    [413](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:413)     # to Nagle's algorithm on a TCP socket.
    [414](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:414)     # Also note we want to avoid sending a 0-length buffer separately,
    [415](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:415)     # to avoid "broken pipe" errors if the other end closed the pipe.

File [~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:373](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:373), in _send()
    [372](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:372) while True:
--> [373](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:373)     n = write(self._handle, buf)
    [374](https://file+.vscode-resource.vscode-cdn.net/Users/cfonnesbeck/phillies/pie/~/mambaforge/envs/pie/lib/python3.10/multiprocessing/connection.py:374)     remaining -= n

BrokenPipeError: [Errno 32] Broken pipe

fonnesbeck avatar Feb 13 '24 19:02 fonnesbeck

Note that this occurs even when running single chains, which is odd since there should be no multiprocessing going on. It appears that CompoundStep uses multiprocessing even when there is a single chain.

fonnesbeck avatar Feb 22 '24 17:02 fonnesbeck

Also occurs for Python 3.11

fonnesbeck avatar Feb 22 '24 17:02 fonnesbeck

stale

fonnesbeck avatar Sep 25 '24 21:09 fonnesbeck