csv icon indicating copy to clipboard operation
csv copied to clipboard

Option to include original csv line in resulting tuple returned by CSV.decode(stream, options)

Open npac opened this issue 6 years ago • 2 comments

Hi, Thanks for great library. The CSV.decode(...) function returns a tuple, either {:ok, map()} in case of success or {:error, binary()} in case of failure to decode.

My usecase. Read a csv file. Insert a row in DB for each csv line. Invalid csv lines must be saved in a separate file for further analyses.

In my case i need to get original csv line in both cases (decode success or failure).

  • In case of decode failure, problematic lines are saved in a separate file for further analyses.
  • In case of successful decode, i still need original csv line since result of decoding must be inserted into a database table and if insert fails i also need to save that line in a file for further analyses.

Do you think that an option could be introduced in the library to return a tuple {:ok, map(), binary()} or {:error, binary(), binary()} where 3rd binary is raw data from input stream? I can try to submit a PR for that if it's ok.

So far i come up with next workaround for my case ...

path
  # stream line by line from csv file
  |> File.stream!()
  # Start stream transformation. We do a csv decode line by line.
  # Tradeoff: CSV.decode reports incorrect line number in case of failure to decode a line. Error will always refer to line 1 :( 
  |> Stream.transform(0, fn line, acc ->
    # Decode a CVS line. The result might be either {:ok, map} or {:error, reason}
    [result] = CSV.decode([line], separator: ?,, headers: [:a, :b, :c]) |> Enum.take(1)
    # we need to keep original line in resulting tuple.
    # in case of an error we must save this line in a separate file
    {[Tuple.append(result, line)], acc + 1}
  end)
  # Process result of decoding using a parallel stream
  # Here our stream contains a tuple 
  # either {:ok, %{...}, "foo, bar, baz"} in case of decode success 
  # or {:error, "Row has ... - expected .. line 1, "foo, bar, baz"} 
  |> ParallelStream.each(&process_decoded(&1))
  |> Stream.run()

npac avatar Mar 21 '18 10:03 npac

Hi, this is interesting. Could definitely start something on this, but would also accept a PR to make this optional behaviour.

beatrichartz avatar Mar 03 '19 04:03 beatrichartz

Maybe have a look at pr https://github.com/beatrichartz/csv/pull/95 - this might be what you are looking for

beatrichartz avatar Sep 12 '20 06:09 beatrichartz