pypdf issues

DEV: Record coverage in subprocesses as well

2

stefan6419846

nf-ci

Review mutmut configuration

We currently have a configuration for *mutmut* (https://mutmut.readthedocs.io/en/latest/index.html) inside the repository, but do not seem to really use or have a look at it. This does not really make much...

stefan6419846

`PageObject.extract_text`s `text_visitor` reports a wrong matrix for some text nodes

2

While trying to extract lemmas from this page, I found that some text "nodes" (not sure what the technical term is, I'll refer to them as nodes in this issue)...

LukeSerne

workflow-text-extraction

Two files which look identical (on first inspection) produce different line breaks when extracting text

10

I'm raising this issue as a result of a super useful (and helpful!) chat with @MartinThoma. For simplicity, I am trying to extract the first page of the 'SECTOR ANALYSIS'...

dl-racing

is-bug

workflow-text-extraction

whitespace

Improve documentation for PdfWriter.open_destination

The [`PdfWriter.open_destination`](https://pypdf2.readthedocs.io/en/latest/modules/PdfWriter.html#PyPDF2.PdfWriter.open_destination) docs isn't parsed "right" and so looks a bit incorrect: Looking at the other `@property` values, it doesn't seem like having a `:param:` though is the right way...

MasterOdin

nf-documentation

Performance issue of documented watermarking

``` Instead of overwriting the page every time and loading it again (which takes a lot of time) you might want to create an empty page and merge the watermark...

MartinThoma

nf-documentation

Removing Form Fields after filling them

5

After I fill in the form fields, the form fields remain visible and on top of the text filled in causing the filled in text to be hidden under it...

Usouf

workflow-forms

DOC: Update references to the PDF specification

1

Also change TABLE to table

j-t-1

ENH: consider images inside PDF made with onlyoffice

10

closes #2613 Added code to detect patterns in "_get_ids_image". To avoid any conflicts with images that could be located directly in a page or images using the same ID in...

0xNath

less "conventional" Indexed 4 bit RGB colour format not handled correctly.

3

When merging PDF containing images (one per page) some images were alterd in the resulting merged file. The issue was discussed on stackoverflow here: https://stackoverflow.com/questions/78508800/pypdf-does-not-give-me-the-right-image where it was proposed to...

andreaskagedal

is-bug

workflow-merge

pypdf
pypdf copied to clipboard

Metadata

DEV: Record coverage in subprocesses as well

Review mutmut configuration

`PageObject.extract_text`s `text_visitor` reports a wrong matrix for some text nodes

Two files which look identical (on first inspection) produce different line breaks when extracting text

Improve documentation for PdfWriter.open_destination

Performance issue of documented watermarking

Removing Form Fields after filling them

DOC: Update references to the PDF specification

ENH: consider images inside PDF made with onlyoffice

less "conventional" Indexed 4 bit RGB colour format not handled correctly.

← Metadata

Owner

Metadata

pypdf pypdf copied to clipboard

Metadata

← Metadata

Owner

Metadata

pypdf
pypdf copied to clipboard