rack-streaming-proxy icon indicating copy to clipboard operation
rack-streaming-proxy copied to clipboard

Chunked encoding gets doubly chunked when using puma corrupting the content

Open pedrocr opened this issue 10 years ago • 14 comments

I am running a Rails app in puma and using rack-streaming-proxy to proxy a MJPEG HTTP stream. The stream is a multipart message in chunked encoding. The proxy seems to break the stream by not proxying it as chunked encoding. All the code is here:

https://github.com/pedrocr/camerasink

Testing with telnet the original stream returns:

HTTP/1.0 200 OK
Server: camerasave
Date: Fri, 18 Apr 2014 11:13:06 GMT
Transfer-Encoding: chunked
Content-Type: multipart/x-mixed-replace;boundary=SurelyJPEGDoesntIncludeThis

51
--SurelyJPEGDoesntIncludeThis
Content-Type: image/jpeg
Content-Length: 9437

(the image contents go here, the headers after "51" are repeated on every new image)

The same stream after being proxied with rack-streaming-proxy and puma returns:

HTTP/1.0 200 OK
server: camerasave
date: Fri, 18 Apr 2014 11:12:16 GMT
content-type: multipart/x-mixed-replace;boundary=SurelyJPEGDoesntIncludeThis
Cache-Control: no-cache
X-Request-Id: c68247f6-e4ef-4507-b34c-c37330062289
X-Runtime: 0.256272
Connection: close

--SurelyJPEGDoesntIncludeThis
Content-Type: image/jpeg
Content-Length: 9427

(the image contents go here and the headers get repeated as well)

The difference seems to be that the proxied request doesn't have chunked encoding. At least in firefox this breaks the MJPEG streaming.

pedrocr avatar Apr 18 '14 11:04 pedrocr

Just looked at the specs in the code and HTTP/1.0 requests will strip the chunked encoding. Don't know why that is but Firefox will be doing HTTP/1.1 requests anyway. I was just testing with HTTP/1.0 since it's easier. Here's the same with HTTP/1.1:

$ telnet localhost 8080
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /camera/stream/test HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
server: camerasave
date: Fri, 18 Apr 2014 11:25:27 GMT
transfer-encoding: chunked
content-type: multipart/x-mixed-replace;boundary=SurelyJPEGDoesntIncludeThis
Cache-Control: no-cache
X-Request-Id: 48dda60f-baae-455f-a277-e34c06c8220a
X-Runtime: 0.124132
Transfer-Encoding: chunked

57
51
--SurelyJPEGDoesntIncludeThis
Content-Type: image/jpeg
Content-Length: 9453



24f5
24ed
JFIFC

The problem seems to be that the proxy is actually keeping the chunked encoding markers from the original stream on the proxied stream. This corrupts the JPEG images.

pedrocr avatar Apr 18 '14 11:04 pedrocr

I was looking through the spec and noticed a comment about it only working on unicorn. This indeed works properly on unicorn. Unfortunately I can hang unicorn very easily running these long-lived clients. I probably need to figure out how to increase the number of workers.

Is this a bug in puma then or is rack-streaming-proxy doing something that makes puma redo the chunking and break the stream?

pedrocr avatar Apr 18 '14 21:04 pedrocr

Created a test case for this here:

https://github.com/pedrocr/camerasink/blob/b01aac64e8aaef96f8527f6e29768cc1f5f6199b/testcases/puma_double_chunking.ru

It's runnable as "rackup puma_double_chunking.ru" and then:

The original server first (a basic webrick server):

$ curl http://localhost:9000
Hello!

The puma+rack-streaming-proxy server, proxying the first server:

$ curl http://localhost:9292
7
Hello!

0

pedrocr avatar Apr 18 '14 22:04 pedrocr

The reason this happens is that puma always uses chunked encoding because it's more efficient, server wise, when the body is being returned in sections via #each.

rack-streaming-proxy assumes that the rack server is a dumb one and just outputs the chunks end-to-end back to the client, but thats not what puma does.

I'd like to get some clarification from someone on the rack team (@rkh, @spastorino, @raggi) on whether Puma is violating the rack spec or if rack-streaming-proxy is. rack-streaming-proxy and puma probably need to figure out a way to communicate who is doing the chunked encoding so the other doesn't.

At any rate, you don't need the chunked proxy with puma because, as you've seen, puma does it by default for you.

evanphx avatar Apr 18 '14 23:04 evanphx

Thanks for looking into it @evanphx. Unfortunately I really do need the proxying to work. The chunking is already done from the source I just need to pass it through. My use case is that I run a C program to do some video capture on a set of surveillance cameras and create a MJPEG monitor output for each camera on http://127.0.0.1:30###/mjpeg. I then want my app to show that as http://myapp/camera/stream/cameraname so that it's all self-contained in a single HTTP server.

I don't think I can disable the chunked encoding in the C program either because since this is an infinite stream there's no way to do a content-length encoding.

pedrocr avatar Apr 18 '14 23:04 pedrocr

Ah! If you're using puma, I'd suggest you use rack.hijack, which puma supports natively. That will allow you to very efficiently stream the output from your program directly to the HTTP client. You can even attach the socket that rack.hijack returns straight up to the stdout of your C program so that you don't have to copy the data yourself.

evanphx avatar Apr 19 '14 00:04 evanphx

@pedrocr chunked transfer is supported from http/1.1. So, your first example on original stream

HTTP/1.0 200 OK
Server: camerasave
Date: Fri, 18 Apr 2014 11:13:06 GMT
Transfer-Encoding: chunked

looks wrong already. If the server can speak HTTP/1.1, but the request is from HTTP/1.0 client, it should retrun response like

HTTP/1.1 200 OK
Server: camerasave
Date: Fri, 18 Apr 2014 11:13:06 GMT
content-length: xxxx

The first line expresses whether the server can speak http/1.1. But, because the request is from http/1.0, it should not return chunked transfer because the client can not understand it.

I will look into the case of http/1.1.

sonots avatar Apr 19 '14 01:04 sonots

Hmm, well, as you guys tell, it looks puma is doing something ... rack-streaming-proxy can't control it because the puma tweaks and returns response to the client finally.

sonots avatar Apr 19 '14 02:04 sonots

@sonots, I was just using HTTP/1.0 requests to make it easier to write it out on the terminal (saves the host line). The clients will be doing HTTP/1.1. Strangely libsoup will do chunked encoding with 1.0. But as you've seen the problem exists with HTTP/1.1 as well. rack-streaming-proxy and puma are not seeing eye to eye on the rack spec.

@evenphx, I'll look into rack.hijack. At worst it should allow me to implement the proxy myself. But it may actually be better than that. Note that the C program doesn't output on stdout it runs a libsoup HTTP server itself. But maybe I can get rid of that and instead use a pipe or a file socket directly. Need to figure out how I'd handle multiple clients like that, but sounds interesting. Thanks.

pedrocr avatar Apr 19 '14 10:04 pedrocr

I've managed to get a basic version working with rack_hijack and this works with puma. Unfortunately this will mean a separate implementation for servers that support rack hijacking and those that dont.

Additionally I've made it work by violating the encapsulation of the Net::HTTP class to grab the socket inside once the request is sent which is probably not ideal lol.

pathsny avatar Mar 22 '15 18:03 pathsny

@evanphx Using middleware for chunking was a bad idea from the outset, it turns out. The problem being that it's consistently poorly defined for a server whether an app has performed chunking already. My recommendation is that you assume that if you are on http 1.1+ and have no content length and no connection:close, and no transfer-encoding:chunked, then you should be safe to apply chunked encoding. If any of these are otherwise, you may break user required semantics. If you see an explicit transfer-encoding:identity (which is deprecated since RFC7230) then you also probably shouldn't re-encode. Some protocols have unfortunate recommendations in this area (glaring at Server Sent Events in particular).

raggi avatar Mar 23 '15 16:03 raggi

As far as this ticket goes, I think the core problem in rack-streaming-proxy is that this class https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/response.rb should be declaring Transfer-Encoding:chunked, given that it's outputting chunked encoding. It is not doing so, and as such servers have no way to know of the encoding. Somewhere else, code is generating a non-canonical "transfer-encoding" header also, that may be missed in other tests. I strongly recommend canonicalizing to avoid such issues, but either way, whatever is missing that is buggy too.

https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/proxy.rb#L87-L91 is wrong.

https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/proxy.rb#L93-L96 is also wrong, iff it still chunks.

https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/response.rb#L76 single line reads are insufficient to read all the headers. You need a parser here, although I'm not entirely sure what @piper is, given that this is a rack proxy, but gets suggests the wrong kind of read.

raggi avatar Mar 23 '15 17:03 raggi

@raggi @piper is an interprocess communicator. The actual response handling code is here https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/session.rb#L85 which just uses Net::HTTP

pathsny avatar Mar 23 '15 17:03 pathsny

@raggi Ok, thanks for the clarification. Puma already disables it's own chunked encoding if any Transfer-Encoding header is present, since that's the only safe assumption.

evanphx avatar Mar 23 '15 17:03 evanphx