cog
cog copied to clipboard
increase pipe size to maybe improve large output handling
lifted from https://github.com/coreweave/tensorizer/blob/main/tensorizer/_wide_pipes.py
Models that return a large batch of vectors have errors. https://replicate.com/p/ksmnnytbfsx7zbbfdeuyhkndva
{"logger": "cog.server.runner", "timestamp": "2023-09-23T06:42:37.904471Z", "exception": "Traceback (most recent call last):\n File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/runner.py\", line 113, in handle_error\n raise error\n File \"/root/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/pool.py\", line 125, in worker\n result = (True, func(*args, **kwds))\n ^^^^^^^^^^^^^^^^^^^\n File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/runner.py\", line 334, in predict\n return _predict(\n ^^^^^^^^^\n File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/runner.py\", line 370, in _predict\n for event in worker.predict(input_dict, poll=0.1):\n File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/worker.py\", line 118, in _wait\n ev = self._events.recv()\n ^^^^^^^^^^^^^^^^^^^\n File \"/root/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/connection.py\", line 250, in recv\n return _ForkingPickler.loads(buf.getbuffer())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n_pickle.UnpicklingError: invalid load key, '\\x00'.", "severity": "ERROR", "message": "caught exception while running prediction"}
This seems to be a null byte where pickle is expecting an opcode (load key). I don't know that this error specifically related to pipe size, but we've had problems with large outputs before, and we may as well try this.
I'm uncertain if I've put the fcntl in the right place though.
Do we have a test case which can reproduce this problem or a description of what the model is trying to do?
I'm a little reluctant to just start bumping limits here until I understand the use case. The pipe is primarily a mechanism for control and metadata to flow between the parent process and the subprocess, and shoving megabytes of data through it doesn't sound like a good idea.
We can reproduce Andreas' error consistently: https://replicate.com/p/ppvxda3b4qjzkwlrokyemimf2y. There's no "tweak it" but just copying the same prompt into the model gets the same error. The load key message is only in the pod logs, it's not surfaced.
There's a customer (see slack) that ran into "Invalid Load Key: \x00" as a visible error message
We can tell people to only do vector embedings or similar through files, but this may also affect e.g. people who want to stream audio without having to download each chunk (same customer, also)
Understood, but what is the object size at which this becomes a problem? If we're trying to put 64KB blobs of data down the pipe I don't have any objections, but if we're throwing multiple megabyte objects down it I think we need to take a step back and think about how this should work, because I don't think it should necessarily use this pipe to do that.
And again, I think we should have a test case that demonstrates the problem here.
yeah it seems like it's on the order of 64-128kb or so, more details in slack. I'm not sure pipes are so bad for streaming audio or stuff like that
This PR tries to deal with the wrapped streams, which aren't actually the problem except for extremely long logs, the problem is the multiprocessing.Pipe, which is by default a socket so F_SETPIPE_SZ won't work
I think what you're saying is that this PR is targeting the wrong part of the code entirely. I'm going to close it, but feel free to reopen if I've misunderstood.
c.f. this happens often enough that it's an issue https://github.com/replicate/replicate-web/commit/7cdfbea2c483342a31df44bd578ac65e1d573449
same problem here any news on that?
When are we expecting a resolution on this? @nickstenning
I think this ought to be increasing the correct thing now
still see the error, or is this not released yet?
it is not released, but I think this PR should do the correct thing
This is an important fix for my company. We are using Replicate to train SDXL models in specific image styles, and then deploy those LoRA weights to production instances of SDXL we are using on GCP. Please prioritize the fix. Thanks.
It sounds like #1758 may have pinpointed the cause of the null byte pickling error you observed. For now, I'm going to close this PR and move forward with that first. If that doesn't resolve the problem we can reopen and take another look at this approach.
I don't believe that's the only cause, it will help with some other problems but the large outputs should still cause incomplete reads
This is still happening with whisper