Issue Adding Text to Imported PDF Pages Using PdfPig's PdfDocumentBuilder - PdfDocumentFormatException

Open darbid opened this issue 1 year ago • 0 comments

I'm encountering an issue with PdfPig while trying to add text to pages imported from an existing PDF document. Here's the code I'm using:

using var rawDocument = PdfDocument.Open(path);  
using var document = PdfDocument.Open(Augment(rawDocument));  
   
private byte[] Augment(PdfDocument document)  
{  
    PdfDocumentBuilder builder = new();  
    var font = builder.AddStandard14Font(Standard14Font.Helvetica);  
  
    for (int i = 1; i <= document.NumberOfPages; i++)  
    {  
        var page = document.GetPage(i);  
        var images = page.GetImages();  
        var pageBuilder = builder.AddPage(document, i);  
        foreach (var image in images)  
        {  
            var point = new PdfPoint(image.Bounds.BottomLeft.X, (image.Bounds.TopLeft.Y + image.Bounds.BottomLeft.Y) / 2);  
            // var imageIndex = AddImage(image);  
            pageBuilder.AddText($"<<image-.png>>", 8, point, font);  
        }  
    }  
    byte[] fileBytes = builder.Build();  
    return fileBytes;  
}

Context:

I'm using builder.AddPage(document, i) to import pages from an existing PDF into a new PdfDocumentBuilder.
My goal is to add text annotations near images on the pages.
The AddImage method is not relevant to the issue (it's commented out).

Problem:

When I run this code, I get the following PdfDocumentFormatException errors when attempting to open the augmented PDF:

UglyToad.PdfPig.Core.PdfDocumentFormatException: 'Could not find the object number 50 0 with type StreamToken instead, it was found with type ObjectToken.'

The exception occurs in the BasePageFactory class, specifically in the following method:

public TPage Create(int number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations)  
{  
    // ...  
    var contentStream = DirectObjectFinder.Get<StreamToken>(obj, PdfScanner);  
    // ...  
}  
   
public static T? Get<T>(IndirectReference reference, IPdfTokenScanner scanner) where T : class, IToken  
{  
    var temp = scanner.Get(reference);  
    if (temp is null || temp.Data is NullToken)  
    {  
        return null;  
    }  
  
    if (temp.Data is T locatedResult)  
    {  
        return locatedResult;  
    }  
  
    if (temp.Data is IndirectReferenceToken nestedReference)  
    {  
        return Get<T>(nestedReference, scanner);  
    }  
  
    if (temp.Data is ArrayToken array && array.Data.Count == 1)  
    {  
        var arrayElement = array.Data[0];  
  
        if (arrayElement is IndirectReferenceToken arrayReference)  
        {  
            return Get<T>(arrayReference, scanner);  
        }  
  
        if (arrayElement is T arrayToken)  
        {  
            return arrayToken;  
        }  
    }  
  
    throw new PdfDocumentFormatException($"Could not find the object number {reference} with type {typeof(T).Name} instead, it was found with type {temp.GetType().Name}.");  
}

Observations:

If I remove the pageBuilder.AddText line, the code executes without errors, and the augmented PDF opens correctly.
This suggests that adding text to the imported pages is causing the issue.
This only happens on rare pdfs. Most of the time it is ok. I can give you an example pdf where this fails but would prefer to do it directly with you and not post it here.

Question:

Is it possible to use PdfPig's PdfDocumentBuilder to add text to pages imported from an existing PDF without encountering the PdfDocumentFormatException error? If so, how can I modify my code to achieve this?

Dec 07 '24 07:12 darbid