unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Fix invalid evaluation doctype deduction

Open micmarty-deepsense opened this issue 9 months ago • 0 comments

There was a bug in evaluation.py that caused extensions of certain files to be detected improperly. Evaluation files are expected to have two extensions, e.g. foobar.pdf.json because they were partitioned first. The code was prone to a case when more than 3 dots are present in file name.

  • [x] adjust doctype extraction for:
    • [x] TextExtractionMetricsCalculator
    • [x] TableStructureMetricsCalculator
    • [x] ElementTypeMetricsCalculator
  • [x] unit test

micmarty-deepsense avatar May 21 '24 22:05 micmarty-deepsense