cog increase pipe size to maybe improve large output handling

lifted from https://github.com/coreweave/tensorizer/blob/main/tensorizer/_wide_pipes.py

Models that return a large batch of vectors have errors. https://replicate.com/p/ksmnnytbfsx7zbbfdeuyhkndva

{"logger": "cog.server.runner", "timestamp": "2023-09-23T06:42:37.904471Z", "exception": "Traceback (most recent call last):\n  File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/runner.py\", line 113, in handle_error\n    raise error\n  File \"/root/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/pool.py\", line 125, in worker\n    result = (True, func(*args, **kwds))\n                    ^^^^^^^^^^^^^^^^^^^\n  File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/runner.py\", line 334, in predict\n    return _predict(\n           ^^^^^^^^^\n  File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/runner.py\", line 370, in _predict\n    for event in worker.predict(input_dict, poll=0.1):\n  File \"/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/cog/server/worker.py\", line 118, in _wait\n    ev = self._events.recv()\n         ^^^^^^^^^^^^^^^^^^^\n  File \"/root/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/connection.py\", line 250, in recv\n    return _ForkingPickler.loads(buf.getbuffer())\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n_pickle.UnpicklingError: invalid load key, '\\x00'.", "severity": "ERROR", "message": "caught exception while running prediction"}

This seems to be a null byte where pickle is expecting an opcode (load key). I don't know that this error specifically related to pipe size, but we've had problems with large outputs before, and we may as well try this.

I'm uncertain if I've put the fcntl in the right place though.

Sep 23 '23 07:09 technillogue

Do we have a test case which can reproduce this problem or a description of what the model is trying to do?

I'm a little reluctant to just start bumping limits here until I understand the use case. The pipe is primarily a mechanism for control and metadata to flow between the parent process and the subprocess, and shoving megabytes of data through it doesn't sound like a good idea.

Oct 03 '23 12:10 nickstenning

We can reproduce Andreas' error consistently: https://replicate.com/p/ppvxda3b4qjzkwlrokyemimf2y. There's no "tweak it" but just copying the same prompt into the model gets the same error. The load key message is only in the pod logs, it's not surfaced.

There's a customer (see slack) that ran into "Invalid Load Key: \x00" as a visible error message

We can tell people to only do vector embedings or similar through files, but this may also affect e.g. people who want to stream audio without having to download each chunk (same customer, also)

Oct 03 '23 21:10 technillogue

Understood, but what is the object size at which this becomes a problem? If we're trying to put 64KB blobs of data down the pipe I don't have any objections, but if we're throwing multiple megabyte objects down it I think we need to take a step back and think about how this should work, because I don't think it should necessarily use this pipe to do that.

And again, I think we should have a test case that demonstrates the problem here.

Oct 04 '23 14:10 nickstenning

yeah it seems like it's on the order of 64-128kb or so, more details in slack. I'm not sure pipes are so bad for streaming audio or stuff like that

Oct 04 '23 16:10 technillogue

This PR tries to deal with the wrapped streams, which aren't actually the problem except for extremely long logs, the problem is the multiprocessing.Pipe, which is by default a socket so F_SETPIPE_SZ won't work

Nov 30 '23 18:11 technillogue

I think what you're saying is that this PR is targeting the wrong part of the code entirely. I'm going to close it, but feel free to reopen if I've misunderstood.

Dec 08 '23 13:12 nickstenning

c.f. this happens often enough that it's an issue https://github.com/replicate/replicate-web/commit/7cdfbea2c483342a31df44bd578ac65e1d573449

Dec 09 '23 21:12 technillogue

same problem here any news on that?

Jan 12 '24 06:01 shanginn

When are we expecting a resolution on this? @nickstenning

Jan 12 '24 13:01 RohitMidha23

I think this ought to be increasing the correct thing now

Jan 12 '24 20:01 technillogue

still see the error, or is this not released yet?

Jan 15 '24 10:01 shanginn

it is not released, but I think this PR should do the correct thing

Jan 15 '24 20:01 technillogue

This is an important fix for my company. We are using Replicate to train SDXL models in specific image styles, and then deploy those LoRA weights to production instances of SDXL we are using on GCP. Please prioritize the fix. Thanks.

Apr 02 '24 16:04 steveternium

It sounds like #1758 may have pinpointed the cause of the null byte pickling error you observed. For now, I'm going to close this PR and move forward with that first. If that doesn't resolve the problem we can reopen and take another look at this approach.

Jun 25 '24 21:06 mattt

I don't believe that's the only cause, it will help with some other problems but the large outputs should still cause incomplete reads

Jun 25 '24 22:06 technillogue

This is still happening with whisper

Aug 01 '24 18:08 technillogue

cog cog copied to clipboard

increase pipe size to maybe improve large output handling

cog
cog copied to clipboard