csv
csv copied to clipboard
Option to include original csv line in resulting tuple returned by CSV.decode(stream, options)
Hi,
Thanks for great library.
The CSV.decode(...)
function returns a tuple, either {:ok, map()}
in case of success or {:error, binary()}
in case of failure to decode.
My usecase. Read a csv file. Insert a row in DB for each csv line. Invalid csv lines must be saved in a separate file for further analyses.
In my case i need to get original csv line in both cases (decode success or failure).
- In case of decode failure, problematic lines are saved in a separate file for further analyses.
- In case of successful decode, i still need original csv line since result of decoding must be inserted into a database table and if insert fails i also need to save that line in a file for further analyses.
Do you think that an option could be introduced in the library to return a tuple {:ok, map(), binary()}
or {:error, binary(), binary()}
where 3rd binary is raw data from input stream? I can try to submit a PR for that if it's ok.
So far i come up with next workaround for my case ...
path
# stream line by line from csv file
|> File.stream!()
# Start stream transformation. We do a csv decode line by line.
# Tradeoff: CSV.decode reports incorrect line number in case of failure to decode a line. Error will always refer to line 1 :(
|> Stream.transform(0, fn line, acc ->
# Decode a CVS line. The result might be either {:ok, map} or {:error, reason}
[result] = CSV.decode([line], separator: ?,, headers: [:a, :b, :c]) |> Enum.take(1)
# we need to keep original line in resulting tuple.
# in case of an error we must save this line in a separate file
{[Tuple.append(result, line)], acc + 1}
end)
# Process result of decoding using a parallel stream
# Here our stream contains a tuple
# either {:ok, %{...}, "foo, bar, baz"} in case of decode success
# or {:error, "Row has ... - expected .. line 1, "foo, bar, baz"}
|> ParallelStream.each(&process_decoded(&1))
|> Stream.run()
Hi, this is interesting. Could definitely start something on this, but would also accept a PR to make this optional behaviour.
Maybe have a look at pr https://github.com/beatrichartz/csv/pull/95 - this might be what you are looking for