BobLd
BobLd
Not sure, but could be linked to an issue in the `editops_from_cost_matrix` function. Check a possible solution here: https://github.com/ztane/python-Levenshtein/issues/16#issuecomment-613626787
@securigy can you provide the exact pdf you used (generated from the html page I assume)
@last-Programmer can you confirm you used the latest nightly build [0.1.9-alpha-20240402-f6292](https://www.nuget.org/packages/PdfPig/0.1.9-alpha-20240402-f6292)
@stephen-williamson Thanks for sharing the document. The main issue I see with your document is that the page contains about 2 million letters.... `NearestNeighbourWordExtractor` was not designed to handle that...
after further analysis, the letter count can be brought down to 300k by only taking in account the ones that are within the boundary of the page Related to #681
@EliotJones now good for review
@Numpsy I'm not very familiar with source generators but we could also take the approach to hard code stuf. This is the approach I took, see here https://github.com/BobLd/PdfPig/blob/caly-20240412/src/UglyToad.PdfPig/Graphics/ReflectionGraphicsStateOperationFactory.cs it works...
Feel free to create a PR when you have time @Numpsy. More than happy to review it and merge it
@mahmoodali31 can you share the problematic pdf file?
@Numpsy let me have a look at these exceptions here, pretty sure it comes from code I wrote a while ago