format_parser
format_parser copied to clipboard
Return a more uniform structure in as_json
At the moment, results are returned as a regular hash, with keys and values containing a mix of strings and symbols:
{"nature"=>:archive, "intrinsics"=>nil, "entries"=>[{:type=>:file, :size=>53335, :filename=>"foo"}, {:type=>:file, :size=>292210, :filename=>"bar"}, {:type=>:file, :size=>1965044, :filename=>"baz"}], "format"=>:zip}
It happened to me several times that I was getting the key or comparing the value using the wrong type. I think normalising the return value of as_json
would be a nice improvement.
Currently, I am doing this to normalise the result hash:
JSON.load(JSON.dump(result))
Here are the options I see:
- use strings or symbols for all key/values (since this is meant to serialise to JSON at some point, it might make sense to stick to strings)
- use a hash with indifferent access; however, that likely adds a dependency and diverts from the idea of having a very strict contract and the least number of dependencies
After trying yo use the new version with stringify_keys:true
in our service, we realized it wasn't enough 😞 .
When the hash has non-strings as values, it raises errors like the following one, when trying to serialize it (when saving to Dynamodb):
Errors > JSON::GeneratorError#15
source sequence is illegal/malformed utf-8
So, the result of as_json
should be ready to be serialized to JSON, with keys and values converted to String. This is how activemodel#as_json
works.
A possible solution is having an option stringify_keys_and_values: true
, which is not a good name (too long) or something similar, to be backwards compatible.
wdyt @linkyndy ?
I don't believe that error is raised because there are symbols used in the hash. But I didn't dig into it, so you may be right.
One easy fix is to make the method return strings only, and bump the version to a major one 🙂 . Or, like you said, you can use an option to keep it backwards compatible (use_strings
?).
I always prefer to keep compatible when possible! 😄
use_strings
seems good for me!
So, I will open another PR here and close https://github.com/WeTransfer/peek/pull/127
I realized the problem with JSON.dump(result)
(source sequence is illegal/malformed utf-8) happens when the result hash contains non utf-8 strings inside.
For example, when the filename is something like "foo\xA9s"