PdfPig
PdfPig copied to clipboard
Issue Adding Text to Imported PDF Pages Using PdfPig's PdfDocumentBuilder - PdfDocumentFormatException
I'm encountering an issue with PdfPig while trying to add text to pages imported from an existing PDF document. Here's the code I'm using:
using var rawDocument = PdfDocument.Open(path);
using var document = PdfDocument.Open(Augment(rawDocument));
private byte[] Augment(PdfDocument document)
{
PdfDocumentBuilder builder = new();
var font = builder.AddStandard14Font(Standard14Font.Helvetica);
for (int i = 1; i <= document.NumberOfPages; i++)
{
var page = document.GetPage(i);
var images = page.GetImages();
var pageBuilder = builder.AddPage(document, i);
foreach (var image in images)
{
var point = new PdfPoint(image.Bounds.BottomLeft.X, (image.Bounds.TopLeft.Y + image.Bounds.BottomLeft.Y) / 2);
// var imageIndex = AddImage(image);
pageBuilder.AddText($"<<image-.png>>", 8, point, font);
}
}
byte[] fileBytes = builder.Build();
return fileBytes;
}
Context:
- I'm using
builder.AddPage(document, i)to import pages from an existing PDF into a newPdfDocumentBuilder. - My goal is to add text annotations near images on the pages.
- The
AddImagemethod is not relevant to the issue (it's commented out).
Problem:
When I run this code, I get the following PdfDocumentFormatException errors when attempting to open the augmented PDF:
UglyToad.PdfPig.Core.PdfDocumentFormatException: 'Could not find the object number 50 0 with type StreamToken instead, it was found with type ObjectToken.'
The exception occurs in the BasePageFactory class, specifically in the following method:
public TPage Create(int number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations)
{
// ...
var contentStream = DirectObjectFinder.Get<StreamToken>(obj, PdfScanner);
// ...
}
public static T? Get<T>(IndirectReference reference, IPdfTokenScanner scanner) where T : class, IToken
{
var temp = scanner.Get(reference);
if (temp is null || temp.Data is NullToken)
{
return null;
}
if (temp.Data is T locatedResult)
{
return locatedResult;
}
if (temp.Data is IndirectReferenceToken nestedReference)
{
return Get<T>(nestedReference, scanner);
}
if (temp.Data is ArrayToken array && array.Data.Count == 1)
{
var arrayElement = array.Data[0];
if (arrayElement is IndirectReferenceToken arrayReference)
{
return Get<T>(arrayReference, scanner);
}
if (arrayElement is T arrayToken)
{
return arrayToken;
}
}
throw new PdfDocumentFormatException($"Could not find the object number {reference} with type {typeof(T).Name} instead, it was found with type {temp.GetType().Name}.");
}
Observations:
- If I remove the
pageBuilder.AddTextline, the code executes without errors, and the augmented PDF opens correctly. - This suggests that adding text to the imported pages is causing the issue.
- This only happens on rare pdfs. Most of the time it is ok. I can give you an example pdf where this fails but would prefer to do it directly with you and not post it here.
Question:
Is it possible to use PdfPig's PdfDocumentBuilder to add text to pages imported from an existing PDF without encountering the PdfDocumentFormatException error? If so, how can I modify my code to achieve this?