csv icon indicating copy to clipboard operation
csv copied to clipboard

Option to return atoms instead of strings for keys

Open drernie opened this issue 6 years ago • 4 comments

When using the headers option, it would be convenient if I could get atom-keys instead of string-keys. Right now I do this:

  def map_strings_to_atoms(string_key_map) do
    # https://stackoverflow.com/questions/31990134/how-to-convert-map-keys-from-strings-to-atoms-in-elixir
    map = for {key, val} <- string_key_map, into: %{}, do: {String.to_atom(key), val}
    filtered = :maps.filter(fn _, v -> v != "" end, map)
    Factory.create_child_node!(filtered)
  end

drernie avatar Mar 08 '18 13:03 drernie

Such behaviour would be insecure when using with unsantized input as atoms aren't GC-able.

hauleth avatar Apr 28 '18 09:04 hauleth

Hi @drernie apologies for the late reply - could you outline what the use case for this is? I agree with @hauleth that we have to be careful with casting (user) input to atoms without sanitizing it, so this wouldn't be the default option, but there might be a legitimate use case I'm not seeing.

beatrichartz avatar Mar 03 '19 03:03 beatrichartz

Thanks. :-). I am used to this from Ruby, where having atoms is more concise and efficient when accessing dictionaries. I agree it shouldn't be the default, but it would be a nice option for working with data structures that expect to be passed atoms (so I can use the same key on both sides of the assignment).

I realize there is a security risk, though; is there a way implementing in the library could help enforce sanitization?

drernie avatar Mar 06 '19 03:03 drernie

Prepared the PR for it: https://github.com/beatrichartz/csv/pull/100 Implemented 2 approaches with String.to_existing_atom as safe and String.to_atom as an unsafe one. Please review.

vad2der avatar Oct 28 '20 00:10 vad2der

Right now this won't make it in I'm afraid. The "safe" option means the user needs to know about internals of the library and the file headers and prepare the atoms before parsing. At that point passing the headers in via a list is already an option. The unsafe option is unsafe since it means that this library could be used to attack services that take CSV as their input by uploading CSVs with large headers.

I do not currently see a way to implement this safely and with a good abstraction, feel free to reopen if there is one.

beatrichartz avatar Oct 22 '22 06:10 beatrichartz