marimo icon indicating copy to clipboard operation
marimo copied to clipboard

BlockingIO Error

Open andrewhill157 opened this issue 1 year ago • 8 comments

Describe the bug

I apologize in advance that this might be a bit annoying to reproduce.

When loading an app like the example given below, some fraction of the time it will fail to run and yield the following type of error (after which point the user has to refresh the app to get things going again):

Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7fbf4d755c60>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/ceph/home/andrew/temp/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7fbf4d755c60>>

handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/ceph/home/andrew/temp/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 256, in recv
    return _ForkingPickler.loads(buf.getbuffer())
_pickle.UnpicklingError: invalid load key, '\x07'.
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7fbf4d755c60>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/ceph/home/andrew/temp/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 11] Resource temporarily unavailable

but other times it loads totally fine.

I initially noticed the error above on a linux server I'm using to deploy the app, but I've also observed something similar locally on my mac but seemingly much less frequently (I had to sit there refreshing for a good bit, whereas is much easier to catch on the server):

Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x15d9f67a0>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/Users/andrewhill/Desktop/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 35] Resource temporarily unavailable
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x15d9f67a0>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/Users/andrewhill/Desktop/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 35] Resource temporarily unavailable

Environment

Linux

{
  "marimo": "0.6.11",
  "OS": "Linux",
  "OS Version": "6.1.80-1.el9.elrepo.x86_64",
  "Processor": "x86_64",
  "Python Version": "3.10.4",
  "Binaries": {
    "Browser": "--",
    "Node": "--"
  },
  "Requirements": {
    "click": "8.1.7",
    "importlib-resources": "missing",
    "jedi": "0.19.1",
    "markdown": "3.6",
    "pymdown-extensions": "10.8.1",
    "pygments": "2.18.0",
    "tomlkit": "0.12.5",
    "uvicorn": "0.30.1",
    "starlette": "0.37.2",
    "websocket": "missing",
    "typing-extensions": "4.12.1",
    "black": "24.4.2"
  }
}

Mac

{
  "marimo": "0.6.11",
  "OS": "Darwin",
  "OS Version": "23.3.0",
  "Processor": "arm",
  "Python Version": "3.10.4",
  "Binaries": {
    "Browser": "--",
    "Node": "v21.7.1"
  },
  "Requirements": {
    "click": "8.1.3",
    "importlib-resources": "missing",
    "jedi": "0.19.1",
    "markdown": "3.6",
    "pymdown-extensions": "10.8.1",
    "pygments": "2.17.2",
    "tomlkit": "0.12.4",
    "uvicorn": "0.29.0",
    "starlette": "0.37.2",
    "websocket": "missing",
    "typing-extensions": "4.11.0",
    "black": "24.4.2"
  }
}

Code to reproduce

import marimo

__generated_with = "0.6.11"
app = marimo.App(width="full")


@app.cell
def __():
    import pandas as pd
    import numpy as np
    import marimo as mo
    import polars as pl
    import altair as alt
    alt.data_transformers.enable("marimo_csv")

    def create_random_dataset(N):
        x = np.arange(1, N+1)
        y = np.random.rand(N)
        labels = map(str, y)

        return pl.DataFrame({'x': x, 'y': y, 'label': labels})

    # Define the number of points
    N = 200000  # You can change this value to any desired number of points


    def make_plot(n):
        diagonal = (
            alt.Chart(pd.DataFrame({"x": [1, 10000], "y": [1, 10000]}))
            .mark_line(color="red", strokeDash=[10, 10])
            .encode(x="x", y="y")
        )


        dataset = create_random_dataset(n)
        return diagonal + (
            alt.Chart(dataset.to_pandas())
            .mark_point()
            .encode(
                x=alt.X(
                    f"x:Q",
                    title=f"log10(UMI + 1) for",
                    scale=alt.Scale(type="log", base=10),
                ),
                y=alt.Y(
                    f"y:Q",
                    title=f"log10(UMI + 1) for",
                    scale=alt.Scale(type="log", base=10),
                ),
                tooltip=["x", "y", "label"],
            )
        )

    # Display the plots
    interface_elements = [mo.md("#My App"), mo.accordion({"Help": "Help text"})]
    interface_elements.append(mo.hstack([make_plot(N) for _ in range(0, 3)]))
    interface_elements.append(mo.hstack([make_plot(N) for _ in range(0, 3)]))
    mo.vstack(interface_elements)

    return (
        N,
        alt,
        create_random_dataset,
        interface_elements,
        make_plot,
        mo,
        np,
        pd,
        pl,
    )


if __name__ == "__main__":
    app.run()

andrewhill157 avatar Jun 06 '24 23:06 andrewhill157

Note that increasing the number of points also seems to increase the chances that this will happen, which is why I've set it to a high value here.

andrewhill157 avatar Jun 06 '24 23:06 andrewhill157

Also the times I've caught this have all been when running the app with marimo run and refreshing, etc.

andrewhill157 avatar Jun 07 '24 00:06 andrewhill157

I have seen this as well with loading data into tables in a kubernetes app deployment. No good insights to really offer (yet), but I'm suspicious it's related to full buffers and EAGAIN/EWOULDBLOCK signals when trying to write data to a socket for the "frontend". I "customized" our table view to forcibly paginate, limiting how much data would be sent I presume, as a workaround/debugging step and the issue 100% disappeared.

If this is hinting at what the actual cause is, it's not really in marimo but perhaps there are socket usage settings that can avoid it?

ross-at-finix avatar Jun 28 '24 13:06 ross-at-finix

I meet the same problem

CedrusZhao avatar Jul 17 '24 13:07 CedrusZhao

Thanks everyone for documenting this issue.

PR #1822 is an attempt to mitigate this issue by just retrying the socket recv() after a short wait.

Re @ross-at-finix's hypothesis: we do use a TCP socket to facilitate communication between the kernel process and the server. Right now that socket is created and managed by a multiprocessing.connection.Listener object, which as far as I can tell doesn't allow for increasing the socket buffer size. Perhaps we should just deal with a socket directly.

More context: In marimo run mode, the kernel(s) and server are actually in the same process, so in theory we could just use a simpler communication method in run mode to get around this problem. The socket would still be needed for edit mode, during which the kernel is run in a separate process (so that its execution can be easily interrupted).

akshayka avatar Jul 18 '24 04:07 akshayka

Thanks @akshayka ; I've done a very surface level check with our app by building that PR and using it in our QA deployment and the standard table elements are working great.

ross-at-finix avatar Jul 18 '24 13:07 ross-at-finix

Thanks @akshayka ; I've done a very surface level check with our app by building that PR and using it in our QA deployment and the standard table elements are working great.

That's great, thank you for checking! Version 0.7.8 includes the fix, and should be available on PyPI soon.

akshayka avatar Jul 18 '24 16:07 akshayka

after upgrading to Version 0.7.8, I am facing same issue but with different error. Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7f7728327090>> handle: <Handle Distributor._on_change> Traceback (most recent call last): File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run File "/root/.cache/pypoetry/virtualenvs/justask-4h5SbXZT-py3.11/lib/python3.11/site-packages/marimo/_utils/distributor.py", line 56, in _on_change response = self.input_connection.recv() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/multiprocessing/connection.py", line 250, in recv return _ForkingPickler.loads(buf.getbuffer()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _pickle.UnpicklingError: invalid load key, '\x07'. [W 240719 11:19:59 distributor:59] BlockingIOError in distributor receive: [Errno 11] Resource temporarily unavailable

Sonali-bapte avatar Jul 19 '24 11:07 Sonali-bapte

I have the same error as @Sonali-bapte with version 0.8.12, uvicorn version 0.30.6

TedSinger avatar Sep 11 '24 20:09 TedSinger

just to update, I also pretty regularly see the @Sonali-bapte mentioned

andrewhill157 avatar Sep 12 '24 19:09 andrewhill157

@andrewhill157 @TedSinger @Sonali-bapte can you share more context of when you see this error?

@andrewhill157 does your original reproduction still surface this issue?

akshayka avatar Sep 20 '24 16:09 akshayka

The example I provided above doesn't seem to reproduce the issue reliably for me at this point (current marimo release) in either my linux or mac environments. I have different app that is more intensive but tricky to share that regularly produces the same error message as @Sonali-bapte mentioned above (this was the app that led me to try and make the simpler example in the first place). It looks like you have a PR in progress, but let me know if trying to boil it down to something simpler and usable on your end would still be helpful

andrewhill157 avatar Sep 23 '24 18:09 andrewhill157

Thanks for writing back @andrewhill157.

I've merged a fix, and can release it later today. It solves the issue by using an in-memory queue instead of a socket, which is not needed for run mode. So there's no pickling or TCP connection involved.

akshayka avatar Sep 23 '24 18:09 akshayka

Version 0.8.19 is available on PyPI and contains the fix.

akshayka avatar Sep 23 '24 20:09 akshayka

Thank you, much appreciated! Seems much more reliable for the app I mentioned so far

andrewhill157 avatar Sep 23 '24 22:09 andrewhill157

Thank you, much appreciated! Seems much more reliable for the app I mentioned so far

That's great! Thanks for letting me know.

akshayka avatar Sep 23 '24 23:09 akshayka