PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

Issue Adding Text to Imported PDF Pages Using PdfPig's PdfDocumentBuilder - PdfDocumentFormatException

Open darbid opened this issue 1 year ago • 0 comments

I'm encountering an issue with PdfPig while trying to add text to pages imported from an existing PDF document. Here's the code I'm using:

using var rawDocument = PdfDocument.Open(path);  
using var document = PdfDocument.Open(Augment(rawDocument));  
   
private byte[] Augment(PdfDocument document)  
{  
    PdfDocumentBuilder builder = new();  
    var font = builder.AddStandard14Font(Standard14Font.Helvetica);  
  
    for (int i = 1; i <= document.NumberOfPages; i++)  
    {  
        var page = document.GetPage(i);  
        var images = page.GetImages();  
        var pageBuilder = builder.AddPage(document, i);  
        foreach (var image in images)  
        {  
            var point = new PdfPoint(image.Bounds.BottomLeft.X, (image.Bounds.TopLeft.Y + image.Bounds.BottomLeft.Y) / 2);  
            // var imageIndex = AddImage(image);  
            pageBuilder.AddText($"<<image-.png>>", 8, point, font);  
        }  
    }  
    byte[] fileBytes = builder.Build();  
    return fileBytes;  
}  

Context:

  • I'm using builder.AddPage(document, i) to import pages from an existing PDF into a new PdfDocumentBuilder.
  • My goal is to add text annotations near images on the pages.
  • The AddImage method is not relevant to the issue (it's commented out).

Problem:

When I run this code, I get the following PdfDocumentFormatException errors when attempting to open the augmented PDF:

UglyToad.PdfPig.Core.PdfDocumentFormatException: 'Could not find the object number 50 0 with type StreamToken instead, it was found with type ObjectToken.'  

The exception occurs in the BasePageFactory class, specifically in the following method:

public TPage Create(int number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations)  
{  
    // ...  
    var contentStream = DirectObjectFinder.Get<StreamToken>(obj, PdfScanner);  
    // ...  
}  
   
public static T? Get<T>(IndirectReference reference, IPdfTokenScanner scanner) where T : class, IToken  
{  
    var temp = scanner.Get(reference);  
    if (temp is null || temp.Data is NullToken)  
    {  
        return null;  
    }  
  
    if (temp.Data is T locatedResult)  
    {  
        return locatedResult;  
    }  
  
    if (temp.Data is IndirectReferenceToken nestedReference)  
    {  
        return Get<T>(nestedReference, scanner);  
    }  
  
    if (temp.Data is ArrayToken array && array.Data.Count == 1)  
    {  
        var arrayElement = array.Data[0];  
  
        if (arrayElement is IndirectReferenceToken arrayReference)  
        {  
            return Get<T>(arrayReference, scanner);  
        }  
  
        if (arrayElement is T arrayToken)  
        {  
            return arrayToken;  
        }  
    }  
  
    throw new PdfDocumentFormatException($"Could not find the object number {reference} with type {typeof(T).Name} instead, it was found with type {temp.GetType().Name}.");  
}  

Observations:

  • If I remove the pageBuilder.AddText line, the code executes without errors, and the augmented PDF opens correctly.
  • This suggests that adding text to the imported pages is causing the issue.
  • This only happens on rare pdfs. Most of the time it is ok. I can give you an example pdf where this fails but would prefer to do it directly with you and not post it here.

Question:

Is it possible to use PdfPig's PdfDocumentBuilder to add text to pages imported from an existing PDF without encountering the PdfDocumentFormatException error? If so, how can I modify my code to achieve this?

darbid avatar Dec 07 '24 07:12 darbid