CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

Relax type restrictions on delim

Open Arkoniak opened this issue 4 years ago • 1 comments

This issue is inspired by https://discourse.julialang.org/t/reading-data-text-files-delimited-with-both-spaces-tabs/64851

It may happen, that sometimes you need more than one character as a delimiter (as in this discourse discussion). Currently delim type is restricted as Union{Nothing, Char, String}.If, instead, restrictions were Union{Nothing, AbstractChar, String} (or even better AbstractString instead of String, but this is of lesser importance), then one can define

struct MultiChar{T} <: AbstractChar
  char::T
end

delim = MultiChar(('\t', ' '))

and with appropriate equal methods, CSV.read can properly read csv with a mixture of chars as a delimiter.

Arkoniak avatar Jul 19 '21 07:07 Arkoniak

The bigger issue here is that for the Parsers.jl package, if you pass in a Char delim, it checks if it's ascii and if so, converts it to a UInt8. Otherwise, for multi-byte Chars, it converts them to a string. So the Parsers.jl parsing code assumes delim will either be a UInt8 or String.

quinnj avatar Aug 05 '21 05:08 quinnj