PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

IndexOutOfRangeException in ByteEncodingCMapTable.CharacterCodeToGlyphIndex Method

Open darbid opened this issue 1 year ago • 3 comments

Issue Description:

There is an issue in the CharacterCodeToGlyphIndex method within the ByteEncodingCMapTable class, which implements the ICMapSubTable interface in the namespace UglyToad.PdfPig.Fonts.TrueType.Tables.CMapSubTables.

The method is encountering a System.IndexOutOfRangeException due to an attempt to access an index outside the bounds of the glyphMapping array. Below is the implementation of the method:

public int CharacterCodeToGlyphIndex(int characterCode)  
{  
    if (characterCode < 0 || characterCode >= GlyphMappingLength)  
    {  
        return 0;  
    }  
    return glyphMapping[characterCode];  
}  

Exception Details:

System.IndexOutOfRangeException: Index was outside the bounds of the array.  
   at UglyToad.PdfPig.Fonts.TrueType.Tables.CMapSubTables.ByteEncodingCMapTable.CharacterCodeToGlyphIndex(Int32 characterCode) in D:\..........UglyToad.PdfPig.Fonts\TrueType\Tables\CMapSubTables\ByteEncodingCMapTable.cs:line 57  
Exception thrown: 'System.IndexOutOfRangeException' in UglyToad.PdfPig.Fonts.dll  

Investigation Insights:

Upon examining the values, it appears that the exception is caused by a discrepancy between GlyphMappingLength and the actual length of the glyphMapping array:

  • GlyphMappingLength = 256
  • glyphMapping = {byte[252]}

Attached Document:
I have also attached a document, example.pdf , which might help in reproducing and investigating the issue further.

Steps to Reproduce:

  1. Use the CharacterCodeToGlyphIndex method with an input characterCode greater than or equal to the length of the glyphMapping array but less than GlyphMappingLength.
  2. The method throws System.IndexOutOfRangeException.

Sample Code to Reproduce the Issue:

using var document = PdfDocument.Open(path);  
var text = new StringBuilder();  
   
foreach (var page in document.GetPages())  
{  
    text.AppendLine(string.Join(" ", page.GetWords()));  
}  
   
return text.ToString();  

Proposed Solution:
Ensure that both the GlyphMappingLength and the glyphMapping array are synchronized in terms of their lengths to avoid out-of-bounds access, or update the method to validate against the actual length of glyphMapping rather than GlyphMappingLength.

Additional Context:
The issue is observed in the following file:

D:......\UglyToad.PdfPig.Fonts\TrueType\Tables\CMapSubTables\ByteEncodingCMapTable.cs  

Please let me know if you need any further information or clarification.

darbid avatar Aug 25 '24 02:08 darbid

Hi @darbid, thanks a lot for the detailed issue. I think the easiest would be to update the CharacterCodeToGlyphIndex method.

Are you willing to create a PR to fix that? Or I can take care of it, just let me know

EDIT: I think the following should be enough

 public int CharacterCodeToGlyphIndex(int characterCode)
 {
	 if (characterCode < 0 || characterCode >= glyphMapping.Length)
	 {
		 return 0;
	 }

	 return glyphMapping[characterCode];
 }

BobLd avatar Aug 25 '24 09:08 BobLd

I am sorry I'm not experienced enough to do a PR. Could you please?

darbid avatar Aug 25 '24 10:08 darbid

no prob, done

BobLd avatar Aug 25 '24 11:08 BobLd