FlatFiles icon indicating copy to clipboard operation
FlatFiles copied to clipboard

Support index as well as column name for column definitions

Open CasperWSchmidt opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe. I'm currently using CsvHelper for parsing delimited files, but since I now need to handle fixed length column files as well, I was looking for a way to do this. I came across this library that looks promising and I would like to simplify our codebase by using FlatFiles for both the fixed length and delimited files instead of having two NuGet packages. What I need to make this change, is a way to define the columns for delimited files using indexes instead of column names. Our use case is that we receive files from multiple sources each day and these files are then parsed into a common format (C# object) based on a JSON configuration file (so our code is generic and handle each source's files according to the configuration for that source). In the JSON configuration we use indexes to describe the column used for each property in the model (see below). Not all files have headers with column names so using column name is not really possible. also changing all of our code and configurations to use column name instead is a lot of work.

Example of class mapping from CsvHelper: image

Describe the solution you'd like I want to be able to use Property(m => m.SomeProperty).Index(1); (or something similar) instead of Property(m => m.SomeProperty).ColumnName("N/A"); where "N/A" is some value that is not always available.

Describe alternatives you've considered I'm new to this library so I have no idea if there are any alternatives

Additional context

CasperWSchmidt avatar Sep 23 '21 08:09 CasperWSchmidt

The order that you call Property is remembered by the mapper. So if you can rearrange the order that you call Property it should work as expected. For example, you could have a Dictionary<ColumnName, Action<TypedMapped<YourModel>>> configurations and then sort the column names to match the order in your configuration file.

The Action<YourModel> would get added like this:

configurations.add("column1", (mapper) => mapper.Property(m => m.AccountNumber)/* other config */);

Then it is just a matter of doing:

TypedMapper<YourModel> mapper; // Create a new mapper
for (String columnName in orderedColumnNames)
{
    configurations[columnName](mapper);
}

jehugaleahsa avatar Sep 23 '21 14:09 jehugaleahsa

So if you can rearrange the order that you call Property it should work as expected.

That won't work as the order is different for each provider. Trade date might be column 1 in the file from one provider but column 12 in another. We currently have more than 20 providers implemented using the above technique and only now do we have a provider with fixed length columns that CsvHelper does not support :(

From your examples it looks like the column names does not really have to exist as headers in the input file, is that true? In that case I might just be able to create the dictionary using the ints (index values) as key. Only issue left then, is to handle all the columns that we don't need for our system (but the provider sends anyway). This can be quite a lot for some providers :)

The dynamic approach using index instead of the static remembering of order and column names would still be a nice addition though IMHO.

CasperWSchmidt avatar Sep 24 '21 07:09 CasperWSchmidt

A small snip from one of the configurations (another model though) for a better understanding: image

Btw. Sometimes the same column is added to multiple properties in the model, but it is also possible that multiple columns are added/multiplied etc. together to calculate the value of a property. therefore the ColumnConfigurations property is a dictionary with string as key and a list of configurations for that key (Dictionary<string, IEnumerable<ColumnConfiguration>>). It is finance, nothing is simple :)

CasperWSchmidt avatar Sep 24 '21 07:09 CasperWSchmidt

One option would be to lookup the column configurations by their column number. You could determine the max column number and then count from 1 to max, checking if the lookup has that column. If you don't see a configuration setting for that column, you would use an "ignored" column for that index. This tells the mapper to just throw that column away. The Dictionary I talked about last time is a mapping of these column names, like DepositoryNo to the Actions that configure the mapper.

Here's some pseudo code:

Dictionary<int, String> columnPositionLookup = new();
int maxColumn = 0;
for (ColumnConfiguration configuration : columnConfigurations)
{
    int column = /* however you get the column index */
    if (column > maxColumn)
    {
        maxColumn = column;
    }
    columnPositionLookup.add(column, configuration.Name);
}
TypedMapper<MyModel> mapper = /* create a mapper somehow */
for (int index = 0; index < maxColumn; ++index)
{
    string column = columnPositionLookup.get(index);
    var action = column == null ? null : configurationLookup[column];
    if (action == null)
    {
        mapper.Ignored(); // Column not used by mapper
    }
    else
    {
        action(mapper);
    }
}

jehugaleahsa avatar Sep 24 '21 12:09 jehugaleahsa

We decided to go in another direction, simply stating the length of each column in a new configuration value and then using that to pre-process the file by transforming it to csv using a known delimiter. This means we can stick to our existing code using CsvHelper :)

I still believe adding index support is a good idea though. I looked through some of the code and saw that internally the index is used anyway.

CasperWSchmidt avatar Oct 12 '21 06:10 CasperWSchmidt

image Using a decorator with less than 100 LoC (including include statements etc.) to do the work

CasperWSchmidt avatar Oct 12 '21 06:10 CasperWSchmidt