mitmproxy2swagger icon indicating copy to clipboard operation
mitmproxy2swagger copied to clipboard

Add sampling to flow heuristic.

Open kristiandueholm opened this issue 7 months ago • 1 comments

This pull request solves the issue of mitmproxy dump files (flows) getting interpreted to be .har files by detect_input_format(). The error I have been seeing is:

TypeError: 'int' object is not subscriptable

The proposed solution should solve a lot of issues where inserting -f flow makes the program run properly. For example #213, #171, and likely #214.

Root cause

Enabling the debugging mode by setting the MITMPROXY2SWAGGER_DEBUG environment variable revealed that the heuristics generated in detect_input_format() was higher for .har even though the file was a flow dump. The main heuristic for detecting flow files is non-printable (ascii) characters. The underlying issue is that mitmproxy_dump_file_huristic() assumes these will be present in the first 2048 bytes. In my case these were filled with certificates, containing purely printable characters, causing a miss in the heuristic.

Proposed solution

Instead of relying on the first 2048 bytes, sample throughout the file for non-printables.

kristiandueholm avatar May 25 '25 17:05 kristiandueholm

I have the same problem and this PR fixes the issue.

frafra avatar Aug 01 '25 11:08 frafra