unblob icon indicating copy to clipboard operation
unblob copied to clipboard

Set up performance testing

Open kissgyorgy opened this issue 2 years ago • 2 comments

We need to measure how fast unblob as a whole can operate and what strategy can speed up extraction significantly. Example question we want to answer: Which is faster? Matching on all YARA patterns at once or iterating on the file multiple times with less patterns?

Measure different scenarios:

  • [ ] One big file with few smaller files inside
  • [ ] Lots of small files concatenated and inside
  • [ ] Multiple big files concatenated and inside
  • [ ] Refact the priority handling by concatenating all YARA rules and handle the match results by priority instead of scanning a file multiple times. Measure the difference on various files.

kissgyorgy avatar Dec 07 '21 10:12 kissgyorgy

We should measure how much wall time the different extraction steps take, e.g. Yara matching, chunk calculation, carving, extraction and so on.

vlaci avatar Dec 07 '21 12:12 vlaci

There is pytest-benchmark which we can use to write benchmark tests with some special markers which would be ignored by default but can be easily selected to run.

kissgyorgy avatar Dec 08 '21 10:12 kissgyorgy