cpython icon indicating copy to clipboard operation
cpython copied to clipboard

`multiprocessing.Queue`: Exceeding a certain amount of bytes in the queue prevents proper exit

Open ebonnal opened this issue 1 year ago • 13 comments

Bug report

Bug description:

This terminates properly:

from multiprocessing import Queue
Queue().put(b"0" * 65514)
print("end")

while this prints 'end' and is then stuck:

from multiprocessing import Queue
Queue().put(b"0" * 65515)
print("end")

CPython versions tested on:

Latest identified version without the issue: 3.9.6 Oldest identified version with the issue: 3.9.18

Operating systems tested on:

Linux, macOS

ebonnal avatar Dec 22 '24 21:12 ebonnal

What is the most recent version with the issue? 65515 works for me in IDLE on Win10 with 3.12.8 and 3.14.0a1.

terryjreedy avatar Dec 22 '24 22:12 terryjreedy

On Ubuntu (22.04) and Macos (15.1.1) I have not identified a version not having the issue anymore. The latest I have tested on Ubuntu is 3.12.1 and on Macos it's 3.14.0a0 and they both have the issue.

Interesting that you don't have it on Win10! Waiting for more linux/macos users to confirm that I'm not crazy 🙏🏻

ebonnal avatar Dec 22 '24 22:12 ebonnal

Confirmed it on Linux, for both 3.12 and main. FWIW, the script does print out end for me, but it hangs upon interpreter finalization, so it doesn't really show up when using IDLE or the REPL (which I suspect is why @terryjreedy couldn't reproduce).

pstack is showing that we're getting stuck waiting on a semaphore somewhere while joining a thread. I'll investigate.

ZeroIntensity avatar Dec 22 '24 23:12 ZeroIntensity

Makes sense @ZeroIntensity, for your investigation note that it appeared between 3.9.6 (still ok) and 3.9.18 (has the issue).

ebonnal avatar Dec 22 '24 23:12 ebonnal

3.9 - 3.11 are security-only branches and this bug wouldn't categorize as a security issue IMO (if you were talking about the labels; for bisecting commits, using the main branch is fine)

picnixz avatar Dec 22 '24 23:12 picnixz

I do think this is possibly a security issue. It looks like this applies to any iterable greater than 65514, so if user input was passed to Queue.put, this is possibly a DOS. I'm speculating, though. I'll submit a patch once I figure it out and we'll go from there.

ZeroIntensity avatar Dec 22 '24 23:12 ZeroIntensity

(It could be a DOS but you should consider the possible attack vectors first IMO)

picnixz avatar Dec 22 '24 23:12 picnixz

(dfb1b9da8a4becaeaed3d9cffcaac41bcaf746f4 looks in the right period and touches the queue's closing logic.)

ebonnal avatar Dec 22 '24 23:12 ebonnal

Upon further investigation, this looks unrelated to multiprocessing and is just a nasty side-effect of os.pipe. Apparently, writable files returned by pipe() have an internal limit of 65536 (i.e., the 16-bit integer limit), so attempting to write past that ends up hanging.

For example:

import os

read, write = os.pipe()
my_str = b"0" * 65536
os.write(write, my_str)  # Passed, but buffer is now full
os.write(write, b"1")  # Stuck!

This happens in C as well, so I doubt there's something we can do about that:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main() {
    int fds[2];
    pipe(fds);

    char *str = malloc(65536);
    memset(str, '0', 65536);

    write(fds[1], str, 65536);
    puts("Filled the buffer");
    write(fds[1], "1", 1);
    puts("We'll never get here");

    return 0;
}

This is documented, though. From Wikipedia:

If the buffer is filled, the sending program is stopped (blocked) until at least some data is removed from the buffer by the receiver. In Linux, the size of the buffer is 65,536 bytes (64KiB).

I guess there's three options:

  • Just document it in os.pipe and write this off as wontfix.
  • Use some nasty hacks to raise an exception if more than 64KiB are in the pipe.
  • Switch to something more versatile e.g. a socket (unless that has the same issue!)

@picnixz, what do you think the way to go would be?

ZeroIntensity avatar Dec 23 '24 00:12 ZeroIntensity

(1) seems the most conservative and the less error-prone and hard for us; we should document it in os.pipe but in Queue as well for future users. It's an implementation detail but it could still be useful. Otherwise, (2) seems the second best approach since this would help detecting issues. I'm not sure which hacks you're thinking of though (can't you first convert the input into a list and check how many elements in the list there are? or use islice and check if you can take more than 64k elements?). I don't know for (3).

An alternative would be to have multiple pipes. If you've filled a pipe, then you create another one (so you yourself have a queue of pipes... though this is only an idea I'm not even sure would be efficient and not even sure would have a use case).

We can ask @gpshead for that (sorry Gregory for all the mentions today but today seems the multiprocessing/threading/pipes issues day!)

picnixz avatar Dec 23 '24 01:12 picnixz

An alternative would be to have multiple pipes.

Yeah, that would be (3) in my comment. The question comes down to what kind of maintenance burden that will have.

ZeroIntensity avatar Dec 23 '24 01:12 ZeroIntensity

Running the file in CommandPrompt I see a hard hang (must close window) in 3.12 and 3.13 but not in 3.14.0a1 & 3 (get prompt after running).

terryjreedy avatar Dec 23 '24 05:12 terryjreedy

This is a duplicate of #85927.

x42005e1f avatar Dec 16 '25 20:12 x42005e1f