hotpdf issues

Allow regex patterns in HotPdf.find_text

I have a use case where I have output strings that look like: `"John William Doe01/01/1999Continuing Graduate"` But for an arbitrary name, date, and student type ("Continuing Graduate" vs "Continuing...

jwjeffr

enhancement

Allow case insensitive search in find_text

Currently, we only support case-sensitive searching, it would be helpful if there existed a flag like `ignore_case=True` in `find_text` for case-insensitive searching.

krishnasism

enhancement

Cant read some fonts and characters and returns cid() value

2

Some characters like € is not readable: text 339,45 € is read as 339,45 cid(128) If needed I can send the pdf, cant add here It seems like cause by...

iodabasi

bug

pdfminer.six

Make load_memory_map faster!

Currently, to load a large pdf file (i.e. [bible.pdf](https://github.com/weareprestatech/hotpdf/files/13933240/bible.pdf) of ~700pages) it takes 72 secs. Here the profile data: ![image](https://github.com/weareprestatech/hotpdf/assets/106533898/31076e12-f303-493c-b384-39d85d6fefc2) As we can see the biggest bottleneck is the [load_memory_map](https://github.com/weareprestatech/hotpdf/blob/main/hotpdf/memory_map.py)...

callegarimattia

enhancement

Take version tag from release

Make the version tag in pyproject.toml dynamic. Take it from latest release (?) What about locally? Ref: #49 @krishnasism

callegarimattia

CI/CD

Lower load_memory_map memory consumption

6

Right now to load the Bible (an example of a big [pdf file](https://github.com/weareprestatech/hotpdf/files/13933140/The-Holy-Bible-King-James-Version.pdf) of ~700pages) the memory usage skyrockets to around 1.3 GiB. The big memory allocation is of course...

callegarimattia

enhancement

Optimize memory in span_map

@callegarimattia I identified one place we could optimise. In Span Map instead of storing all "HotCharacters" in the Span, we can only store the str values. This way we will...

callegarimattia

enhancement

hotpdf
hotpdf copied to clipboard

Metadata

Allow regex patterns in HotPdf.find_text

Allow case insensitive search in find_text

Cant read some fonts and characters and returns cid() value

Make load_memory_map faster!

Take version tag from release

Lower load_memory_map memory consumption

Optimize memory in span_map

← Metadata

Owner

Metadata

hotpdf hotpdf copied to clipboard

Metadata

Allow regex patterns in HotPdf.find_text

Allow case insensitive search in find_text

Cant read some fonts and characters and returns cid() value

Make load_memory_map faster!

Take version tag from release

Lower load_memory_map memory consumption

Optimize memory in span_map

← Metadata

Owner

Metadata

hotpdf
hotpdf copied to clipboard