NetTopologySuite.IO.ShapeFile icon indicating copy to clipboard operation
NetTopologySuite.IO.ShapeFile copied to clipboard

[BUG] UTF-8 read encoding error

Open sabvente opened this issue 4 years ago • 1 comments

Describe UTF-8 shapefile loaded with wrong encoding.

Code Snippet

using (var reader = new ShapefileDataReader("shape.shp", new GeometryFactory(),
    Encoding.UTF8))
{
    reader.Read();
}

Investigation NetTopologySuite.IO.DbaseFileReader.CreateStreamProviderRegistry method gets called atShapefileDataReader constructor. It passes encoding.EncodingName to ByteStreamProvider argument with "Unicode (UTF-8)" value.

Later, when parsing the header in NetTopologySuite.IO.DbaseFileHeader.GetEncoding()

try
{
    // The following line throws exception
    // 'Unicode (UTF-8)' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
    return DbaseEncodingUtility.GetEncodingForCodePageName(cpgText);
}
catch
{
    return DefaultEncoding;
}

I think DbaseEncodingUtility.GetEncodingForCodePageName should be called with "UTF-8" argument.

Registering encoding provider Calling Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); does not fixes my issue.

Quick fix Setting the default header encoding.

DbaseFileHeader.DefaultEncoding = Encoding.UTF8;

sabvente avatar Sep 26 '19 14:09 sabvente