dap icon indicating copy to clipboard operation
dap copied to clipboard

Multiple "Skipping impossibly large" errors when working with http_body data

Open alexv-anderson-uw opened this issue 5 years ago • 2 comments
trafficstars

I want to analyze the body of HTTP responses; however, I am seeing errors which say Skipping impossibly large 26003-byte #1 chunk, at offset 6/21013.

I can reproduce these errors when processing the http_get_reply_iframes.json.bz2 file provided in the samples directory using the following command:

bzcat http_get_reply_iframes.json.bz2 | dap json + select ip data + transform data=base64decode + decode_http_reply data + remove data data.http_raw_body + select ip + json

I am running DAP in Docker and mounting the samples directory. My Dockerfile is a duplicate of this repo's Dockerfile, but I removed the installation of MaxMind as it was throwing an error which I think is due to a licensing change...

How should I structure the DAP query to avoid the skipping?

alexv-anderson-uw avatar Jan 09 '20 22:01 alexv-anderson-uw

@alexv-anderson-uw - Thanks for the report, sorry for the delay. We'll take a look.

Simple reproducer with output data:

bzcat http_get_reply_iframes.json.bz2 | grep 173.45.72.243 | \
    dap json + select ip data + transform data=base64decode + \
   decode_http_reply data + remove data +  json | \
jq
Skipping impossibly large 26003-byte #1 chunk, at offset 6/21013

If you look at the body in that case (using the following command) you will see that the chunk size is 6593 in hex which is 26,003 bytes which is larger than the entire response (length 21013). The record for 173.45.72.243 is still emitted by dap but the body value won't be populated or processed by later filters.

bzcat http_get_reply_iframes.json.bz2 | grep 173.45.72.243 | \
    dap json + select ip data + transform data=base64decode + \
    remove data.http_raw_body +  json | \
jq

tsellers-r7 avatar Jun 22 '20 16:06 tsellers-r7

Hi @tsellers-r7, I have the same error too. I tried your way with the input is sonar.http response and the query is

wget -qO-  https://opendata.rapid7.com/sonar.http/2020-07-27-1595862118-http_get_80.json.gz | zcat | dap json + select host port data + transform data=base64decode + decode_http_reply data + remove data.http_raw_body + json

theblackturtle avatar Aug 09 '20 02:08 theblackturtle