Add sampling to flow heuristic.
This pull request solves the issue of mitmproxy dump files (flows) getting interpreted to be .har files by detect_input_format(). The error I have been seeing is:
TypeError: 'int' object is not subscriptable
The proposed solution should solve a lot of issues where inserting -f flow makes the program run properly. For example #213, #171, and likely #214.
Root cause
Enabling the debugging mode by setting the MITMPROXY2SWAGGER_DEBUG environment variable revealed that the heuristics generated in detect_input_format() was higher for .har even though the file was a flow dump. The main heuristic for detecting flow files is non-printable (ascii) characters. The underlying issue is that mitmproxy_dump_file_huristic() assumes these will be present in the first 2048 bytes. In my case these were filled with certificates, containing purely printable characters, causing a miss in the heuristic.
Proposed solution
Instead of relying on the first 2048 bytes, sample throughout the file for non-printables.
I have the same problem and this PR fixes the issue.