daphne icon indicating copy to clipboard operation
daphne copied to clipboard

TCP bufferbloat if WebSocket server keeps pushing data quickly to a slow client

Open yli-cpr opened this issue 6 years ago • 8 comments

This may be able to be solved by registering a twisted producer and when twisted calls pauseProducing, daphne can just disconnect the client.

Autobahn exports this interface through WebSocket protocols.

yli-cpr avatar Feb 15 '18 15:02 yli-cpr

Could you flesh this out a bit more with how you detected this? That would be valuable for whoever picks this up, so they can verify a fix.

andrewgodwin avatar Feb 15 '18 17:02 andrewgodwin

I just managed to produce this issue.

  1. let worker send many messages to reply channel
  2. let the client get into an infinite loop upon first event.
  3. no other thing like load balancer in between.
  4. set daphne ping timeout to a bigger vlaue

I observed that daphne's memory usage kept growing (by MB). Reducing ping timeout may help? But that assumes memory usage won't go too high during the timeout.

yli-cpr avatar Feb 15 '18 20:02 yli-cpr

also, the ping timeout won't work. because it updates "last_data" even for server side send!

I think that's another bug. When server is sending data, it doesn't mean the client/connection is good.

yli-cpr avatar Feb 16 '18 14:02 yli-cpr

I tried to install push producer, and captured pauseProducing call from twisted. A trick is I have to unregister the previous producer (HTTPChannel)

# in onConnect:
            self.transport.unregisterProducer()
            self.registerProducer(PushProducer(self), True)

Ideally, daphne should forward pauseProducing and resumeProducing to worker

yli-cpr avatar Feb 16 '18 16:02 yli-cpr

I've had what I think is a similar/same problem when testing https://github.com/django/django/pull/16384 if generating lots of data from Django (which in a project we do, generating files on-the-fly).

Carlton had some code: https://github.com/django/django/pull/16384#issue-1496410480

Having something a view such as:

async def generate():
    gb_to_send = 5

    chunk_size = 5 * 1024 * 1024

    total_sent = 0
    count = 0

    while total_sent < gb_to_send * 1024 * 1024 * 1024:
        data = f"{count % 10}" * chunk_size
        total_sent += len(data)
        count += 1
        await asyncio.sleep(0.000001) # change it to make slower / faster
        yield data


async def a_streaming_view(request):
    return StreamingHttpResponse(generate())

And then using curl and then stopping curl (Control+Z on Linux/Mac shells) or even quitting (Control+C): data is generated using lots of RAM. I can provide a better example if needed / useful.

cpina avatar Dec 15 '22 00:12 cpina

Hey @cpina - yes please. If you're able to focus in on what's happening here that would be amazing. (Current plan is to swing back here after Django 4.2a1, so any work before then would be extra handy 🎁)

carltongibson avatar Dec 15 '22 05:12 carltongibson

:+1: I will prepare a self-contained example and write my findings in daphne-Twisted code that might help, hopefully!

cpina avatar Dec 15 '22 08:12 cpina

Self contained example to see memory increase: https://gist.github.com/cpina/fe1e3fa982d09997a5957441b97c5d0c

It is the first time that I dive into daphne and Twisted so take the next hypothesis with a pinch of salt!

It's possible to see what I think is the size of what needs to be sent to the client in Daphne via (horrible): self.channel.transport._tempDataLen In daphne/http_protocol.py line 265 just before http.Request.write(self, message.get("body", b""))

Also, it seems that Twisted would like to stop the producer since twisted/internet/abstract.py, method _maybePauseProducer is executed and if self._isSendBufferFull() returns True. It calls self.producer.pauseProducing() (twisted/web/http.py HttpChannel.pauseProducing) but it cannot stop the producer... but I don't know at this point what "Producer" should be, how it should be stopped, how Daphne should set it or if this is a red herring at all or the right path.

Hopefully this helps somehow! I'm happy to test any possible changes or try to fix it (I need to familiarise myself with the related code first).

cpina avatar Dec 15 '22 10:12 cpina