NetTopologySuite.IO.ShapeFile
NetTopologySuite.IO.ShapeFile copied to clipboard
[BUG] UTF-8 read encoding error
Describe UTF-8 shapefile loaded with wrong encoding.
Code Snippet
using (var reader = new ShapefileDataReader("shape.shp", new GeometryFactory(),
Encoding.UTF8))
{
reader.Read();
}
Investigation
NetTopologySuite.IO.DbaseFileReader.CreateStreamProviderRegistry method gets called atShapefileDataReader constructor.
It passes encoding.EncodingName to ByteStreamProvider argument with "Unicode (UTF-8)" value.
Later, when parsing the header in NetTopologySuite.IO.DbaseFileHeader.GetEncoding()
try
{
// The following line throws exception
// 'Unicode (UTF-8)' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
return DbaseEncodingUtility.GetEncodingForCodePageName(cpgText);
}
catch
{
return DefaultEncoding;
}
I think DbaseEncodingUtility.GetEncodingForCodePageName should be called with "UTF-8" argument.
Registering encoding provider
Calling Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); does not fixes my issue.
Quick fix Setting the default header encoding.
DbaseFileHeader.DefaultEncoding = Encoding.UTF8;
Describe UTF-8 shapefile loaded with wrong encoding.
Code Snippet
using (var reader = new ShapefileDataReader("shape.shp", new GeometryFactory(), Encoding.UTF8)) { reader.Read(); }Investigation
NetTopologySuite.IO.DbaseFileReader.CreateStreamProviderRegistrymethod gets called atShapefileDataReaderconstructor. It passesencoding.EncodingNametoByteStreamProviderargument with"Unicode (UTF-8)"value.Later, when parsing the header in
NetTopologySuite.IO.DbaseFileHeader.GetEncoding()try { // The following line throws exception // 'Unicode (UTF-8)' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. return DbaseEncodingUtility.GetEncodingForCodePageName(cpgText); } catch { return DefaultEncoding; }I think
DbaseEncodingUtility.GetEncodingForCodePageNameshould be called with "UTF-8" argument.Registering encoding provider Calling
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);does not fixes my issue.Quick fix Setting the default header encoding.
DbaseFileHeader.DefaultEncoding = Encoding.UTF8;
May I ask how the solution was finally solved? Could you share the method?
Support for a different encodings has been added in the successor library. There is also a sample code demonstrating how to custom encoding.