tcpflow icon indicating copy to clipboard operation
tcpflow copied to clipboard

gzip encoded http not decoding

Open peterwillis opened this issue 9 years ago • 20 comments

First of all - great tool, thank you for building it.

I am having trouble with decoding gzip encoded http, I have built the latest version on Mac OS X, and from what I can see zlib was found correctly during the configure. When I run it in console mode, should it output the decoded content onto the console?

To test it I used this:

https://github.com/ksmith97/GzipSimpleHTTPServer

I created an index.html file with just an html tag with an empty head and body

Ran tcpflow like this:

sudo tcpflow -i lo0 -c -e http

this is the output:

tcpflow: listening on lo0
127.000.000.001.57183-127.000.000.001.08000: GET /index.html HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
If-Modified-Since: Sat, 11 Jun 2016 20:53:30 GMT


127.000.000.001.08000-127.000.000.001.57183: HTTP/1.0 200 OK

127.000.000.001.08000-127.000.000.001.57183: Server: SimpleHTTP/0.6 Python/2.7.11

127.000.000.001.08000-127.000.000.001.57183: Date: Sat, 11 Jun 2016 21:06:29 GMT

127.000.000.001.08000-127.000.000.001.57183: Content-type: text/html

127.000.000.001.08000-127.000.000.001.57183: Content-Encoding: gzip

127.000.000.001.08000-127.000.000.001.57183: Content-Length: 49

127.000.000.001.08000-127.000.000.001.57183: Last-Modified: Sat, 11 Jun 2016 20:53:30 GMT

127.000.000.001.08000-127.000.000.001.57183: 

127.000.000.001.08000-127.000.000.001.57183: ....U}\W....(.......HML...0FR~J%X...J.U...7n.1...

peterwillis avatar Jun 11 '16 21:06 peterwillis

I observe the same with 1.4.5. I wonder if this is a bug or we just failed to tell tcpflow to do it...

olivergondza avatar Nov 06 '17 15:11 olivergondza

I'm not sure. There is a regression test that it passes; can you provide me with a set of packets that do not properly gunzip?

On Nov 6, 2017, at 10:42 AM, Oliver Gondža [email protected] wrote:

I observe the same with 1.4.5. I wonder if this is a bug or we just failed to tell tcpflow to do it...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/simsong/tcpflow/issues/121#issuecomment-342187851, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhTrFAfXHEO-DHBMRLCBxaxlMcNCUueks5szyjwgaJpZM4IzpIk.

simsong avatar Nov 06 '17 17:11 simsong

When I run curl -vL -H 'Accept-Encoding: gzip' http://abclinuxu.cz while sudo tcpflow -s -c -i any host abclinuxu.cz. This is what tcpflow sniffs:

171.025.221.158.00080-010.040.002.211.33610: HTTP/1.1 200 OK
Server: nginx
Date: Tue, 07 Nov 2017 08:07:00 GMT
Content-Type: text/html;charset=UTF-8
Content-Length: 20382
Connection: keep-alive
Set-Cookie: JSESSIONID=7t9xp6e0a5yragwrqcc4kc4g;Path=/;HttpOnly
Last-Modified: Tue, 07 Nov 2017 08:07:00 GMT
Expires: Fri, 22 Dec 2000 05:00:00 GMT
Cache-Control: no-cache, must-revalidate
Pragma: no-cache
Content-Encoding: gzip

XgT91f@*][iv*~h^_eY$I=&u&H_j{te{kf[3L4GYvO6K]}gj6GY([th_}8x_sJf`my@A4:s$W>zZ>yp\9|]mn*?QmT56F%VdH}|M\ow(/hoq/b^|V".E]TKEoZ%l=]z
!#)|+$)|+PaQA6"^L6ot8?~F,hl@<x-r;:e$Ic xo!:U,y+ K[i0OG#m;[C'Y!HT:A)
...
[Binary garbage continues]

olivergondza avatar Nov 07 '17 08:11 olivergondza

If you can provide me with a packet dump, I will review it.

On Nov 7, 2017, at 3:10 AM, Oliver Gondža [email protected] wrote:

When I run curl -vL -H 'Accept-Encoding: gzip' http://abclinuxu.cz while sudo tcpflow -s -c -i any host abclinuxu.cz. This is what tcpflow sniffs:

171.025.221.158.00080-010.040.002.211.33610: HTTP/1.1 200 OK Server: nginx Date: Tue, 07 Nov 2017 08:07:00 GMT Content-Type: text/html;charset=UTF-8 Content-Length: 20382 Connection: keep-alive Set-Cookie: JSESSIONID=7t9xp6e0a5yragwrqcc4kc4g;Path=/;HttpOnly Last-Modified: Tue, 07 Nov 2017 08:07:00 GMT Expires: Fri, 22 Dec 2000 05:00:00 GMT Cache-Control: no-cache, must-revalidate Pragma: no-cache Content-Encoding: gzip

XgT91f@][iv~h^eY$I=&u&H_j{te{kf[3L4GYvO6K]}gj6GY([th}8x_sJf`my@A4:s$W>zZ>yp\9|]mn*?QmT56F%VdH}|M\ow(/hoq/b^|V".E]TKEoZ%l=]z !#)|+$)|+PaQA6"^L6ot8?~F,hl@<x-r;:e$Ic xo!:U,y+ K[i0OG#m;[C'Y!HT:A) ... [Binary garbage continues] — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/simsong/tcpflow/issues/121#issuecomment-342405977, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhTrH_mNpRjUZ4nGQaq5cOR-HzUzKbnks5s0BBcgaJpZM4IzpIk.

simsong avatar Nov 07 '17 22:11 simsong

I am not sure what you mean by packet dump. Here is the output captured without any pretty-printing opts - I have verified the content can be read by gzip -d (with gzip: stdin: unexpected end of file, though): https://gist.github.com/olivergondza/aed85ef7e46b86693bdc4bfb82d65386#file-gziped

olivergondza avatar Nov 08 '17 04:11 olivergondza

I want you to give me a pcap file.


Sent from my phone.

On Nov 7, 2017, at 11:51 PM, Oliver Gondža [email protected] wrote:

I am not sure what you mean by packet dump. Here is the output captured without any pretty-printing opts - I have verified the content can be read by gzip -d (with gzip: stdin: unexpected end of file, though): https://gist.github.com/olivergondza/aed85ef7e46b86693bdc4bfb82d65386#file-gziped

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

simsong avatar Nov 08 '17 10:11 simsong

pcap it is: https://gist.github.com/olivergondza/aed85ef7e46b86693bdc4bfb82d65386#file-gzip-pcap

olivergondza avatar Nov 08 '17 20:11 olivergondza

I'm having the same problem. I'm trying to use tcpflow to follow a REST API. Here's a pcap of a very simple request + response. It's only 7 packets request.zip

tcpflow -r request.pcap -c -a -g 052.043.158.010.55634-172.031.021.034.00080: POST /PrototypeAppServlet HTTP/1.1 Content-Type: application/json Content-Length: 172 Host: prototypeapp.jme5spybzc.us-west-2.elasticbeanstalk.com Connection: Keep-Alive User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_121) Accept-Encoding: gzip,deflate

{"methodName":"getData","requestInfo":{"customerId":753},"interfaceName":"CustomerRecordingEntry","userRole":"Customer","userId":"[email protected]","userCustomerId":753} 172.031.021.034.00080-052.043.158.010.55634: HTTP/1.1 200 OK Server: nginx/1.10.2 Date: Fri, 17 Nov 2017 22:57:10 GMT Content-Type: application/json;charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Access-Control-Allow-Origin: * Content-Encoding: gzip

5b VJ-.+NKWVJI,ITQM-.NLOURWp+(**($+YrK)bH 0

tapple avatar Nov 17 '17 23:11 tapple

Much better. Thanks. I’ll take a look.

(Sent from my laptop.)


Simson L. Garfinkel https://simson.net/ 202-649-0029

On Nov 17, 2017, at 6:16 PM, Matthew Fulmer [email protected] wrote:

Here's a simpler pcap file of a very short request/response with only 7 packets request.zip https://github.com/simsong/tcpflow/files/1484113/request.zip tcpflow -r request.pcap -c -a -g 052.043.158.010.55634-172.031.021.034.00080: POST /PrototypeAppServlet HTTP/1.1 Content-Type: application/json Content-Length: 172 Host: prototypeapp.jme5spybzc.us-west-2.elasticbeanstalk.com Connection: Keep-Alive User-Agent: Apache-HttpClient/4.5.2 (Java/1.8.0_121) Accept-Encoding: gzip,deflate

{"methodName":"getData","requestInfo":{"customerId":753},"interfaceName":"CustomerRecordingEntry","userRole":"Customer","userId":"[email protected] mailto:[email protected]","userCustomerId":753} 172.031.021.034.00080-052.043.158.010.55634: HTTP/1.1 200 OK Server: nginx/1.10.2 Date: Fri, 17 Nov 2017 22:57:10 GMT Content-Type: application/json;charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Access-Control-Allow-Origin: * Content-Encoding: gzip

5b VJ-.+NKWVJI,ITQM-.NLOURWp+(**($+YrK)bH 0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/simsong/tcpflow/issues/121#issuecomment-345391331, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhTrKIglzlNvSbESS0__pP77LJoNQ7Hks5s3hO1gaJpZM4IzpIk.

simsong avatar Nov 17 '17 23:11 simsong

I have the same problems too!

takakawa avatar Mar 29 '18 04:03 takakawa

Still unresolved... Same problem here

let4be avatar Oct 27 '18 08:10 let4be

I won’t be able to get to this for a while.


Sent from my phone.

On Oct 27, 2018, at 4:44 AM, Sergey F. [email protected] wrote:

Still unresolved... Same problem here

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

simsong avatar Oct 27 '18 11:10 simsong

Same problem. Here is my log.

wget - q https://www.dropbox.com/s/ma6cscyy5wjcd1n/image2.log

I captured it using Wireshark. Note the log contains an image. otherwise, the decompress function works.

frankwxu avatar Nov 14 '20 01:11 frankwxu

I'm in the process of doing a complete rewrite of the be13_api that's used by both tcpflow and bulk_extractor. This is an important issue, but the rewrite is more important. You are welcome to submit a patch, or have one of your students work on it as an exercise. Unfortunately, that's all I can offer at the moment.

simsong avatar Nov 14 '20 01:11 simsong

Meanwhile, is it okay if I download your log and add it to the set of unit-tests?

simsong avatar Nov 14 '20 01:11 simsong

Meanwhile, is it okay if I download your log and add it to the set of unit-tests?

Sure. That is what I created for testing.

The HTML and embedded image are here

wget -q https://www.dropbox.com/s/7pkkduka6014uko/image.html wget -q https://www.dropbox.com/s/yb4kvvr1w2scikp/building_20201108_221645.jpg

frankwxu avatar Nov 14 '20 01:11 frankwxu

Thanks again. I'll get to this when the be13_api rewrite is finished. I'll be making tcpflow work with the rewrite before bulk_extractor, as it's a simpler program. The whole system is being updated to C++17 and there will be code coverage of the unit tests, and the unit tests are using a standard unit test framework. It's a lot of work, but i'm learning a lot more about C++ and how it's changed over the past 20 years.

simsong avatar Nov 14 '20 02:11 simsong

FWIW it seems this only happens w/ -c, the body is correctly decoded when written to an HTTPBODY file, at least for me using 1.5.1.

ryanschneider avatar Sep 07 '22 16:09 ryanschneider

Not working for me with 1.5.1.

Content-Encoding: gzip and response is displayed not decoded with -c or without. :man_shrugging:

shufps avatar Nov 05 '22 06:11 shufps

Not working for me with 1.5.1.

Content-Encoding: gzip and response is displayed not decoded with -c or without. 🤷‍♂️

Thanks for the report. Nobody has worked on it, so it is not surprising it does not work. Do you want to try to give it a try?

simsong avatar Nov 05 '22 18:11 simsong