CsvHelper icon indicating copy to clipboard operation
CsvHelper copied to clipboard

GetValue of CsvDataReader doesnt support TypeConverter

Open arnopal opened this issue 6 years ago • 11 comments

GetValue of CsvDataReader doesnt support TypeConverter and always returns string. In the scenario where it is used along with SqlBulkCopy, there is no way to do type conversion of incoming values and for example if CSV uses some non standart date values that causes and issue. Ability to pass index based type converter would be really useful. For example it could take collection of index based converters or column map etc. In this scenario it would be very big overhead to use POCOs to map data types and even that in my understanding is not supported in current version.

arnopal avatar Feb 06 '19 15:02 arnopal

Are you passing in an IDataReader and not a DataTable to SqlBulkCopy.WriteToServer?

JoshClose avatar Feb 06 '19 15:02 JoshClose

Thats correct. That would be goal of using CsvDataReader not to read whole file to memory as file can be very large.

arnopal avatar Feb 06 '19 16:02 arnopal

Ok. In that case we have no clue what the types are. There will need to be some way to specify a mapping or definition of indexes/names to types. Do you know what methods get called during a bulk copy besides object GetValue(int index)?

JoshClose avatar Feb 07 '19 16:02 JoshClose

my observation was that only object GetValue(int index) is called and obviously Read. I would say index based dictionary with type converter would do a job in the begining.

arnopal avatar Feb 07 '19 16:02 arnopal

I'm trying to think of the best way to do this. If it doesn't use int GetOrdinal(string name), it would be pretty simple to just pass in an IDictionary<int, Type> to the constructor that maps the index to the type. If the name is used, it would need to be a little more complex. I don't have SQL Server installed... I looked at the source and I'm only seeing is FieldCount, Read and GetValue. There could be others though. Would you be able to create an IDataReader class and see what is called? Just leave the implementations as throws and fill them in as they get called. You could just take the CsvDataReader source too and set breakpoints.

JoshClose avatar Feb 07 '19 18:02 JoshClose

Can confirm that from my observation only FieldCount, Read and GetValue being used.

As per providing converters, since we have facility in CsvReader to read header and identify columns, we can have another overload which takes column names and converters and then resolves to their respective indexes...

arnopal avatar Feb 08 '19 09:02 arnopal

I'm also interested in this feature. Have a 4GB csv file I'm importing using CsvDataReader and SqlBulkCopy and some rows may have bad data. Any update or workarounds?

wjchristenson2 avatar Jun 12 '19 16:06 wjchristenson2

Did you end up coming up with a solution? I have the exact same issue with using CsvDataReader and SqlBulkCopy. I may end up writing an adapter around CsvDataReader like so:

public CsvDataReaderAdapter(CsvDataReader reader, IDictionary<string, Type> nameToTypeMapper)

and then handle checking the type and converting to the correct object in public object GetValue(int i)

I would be curious if you found a better way to do this?

roblapp avatar Sep 13 '19 16:09 roblapp

I'm curious as well.

derekforeman avatar Oct 03 '19 17:10 derekforeman

Turns out in order to correctly specify types you just need to return a DataTable containing the destination schema from the GetSchemaTable method in IDataReader. I've submitted a PR #1435 which allows a schema DataTable to be specified in the constructor which seems to do the trick.

mattosaurus avatar Jan 03 '20 15:01 mattosaurus

PR 1435 helps a lot, but still doesn't account for type conversions. Maybe it's the version of SqlBulkCopy that I'm using, but it does not call any of the Get(type) methods.. Instead it only ever calls GetValue(ordinal).

I was able to fix this for my situation (loading 0/1 into bit fields as false/true) by copying the CsvDataReader class and changing the GetValue method to:

public object GetValue(int i) { if (IsDBNull(i)) return DBNull.Value;

if (GetFieldType(i) == typeof(int)) return GetInt32(i);
if (GetFieldType(i) == typeof(bool)) return GetBoolean(i);

return _csv.GetField(i) ?? string.Empty;

}

This looks up the type from the schema provided in the constructor and calls the appropriate Get(type).. which then allows the underlying csvReader to use the type converters and lists (e.g., I had to add 0 to BooleanFalseValues and 1 to BooleanTrueValues). Without the change to GetValue, adding those things to the TypeConverterOptionsCache alone did not work.

steve-wolfe avatar Jun 14 '23 14:06 steve-wolfe