json
json copied to clipboard
[FIX] JSON.dump for hash with duplicate key
Motivation / Background
This Pull Request has been created because to fix #563
The output for the hash is inconsistent when using JSON.dump
with the duplicate key. Specifically, if we examine the result below, the key 'a' appears twice in the output JSON for the same key 'a'.
For example:
JSON.dump({ 'a' => 1, :a => 2 })
Actual Result:
"{\"a\":1,\"a\":2}"
Expected result:
"{\"a\":2}"
cc: @byroot @hsbt
@hsbt , did you get a chance to check this out?
cc: @byroot
I'm not sure why expected behavior is correct. Because result of JSON.dump
is not changed from old version.
$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.6.3
{"a":1,"a":2}
$ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })"
2.7.1
{"a":1,"a":2}
Please let me know if my understanding is wrong.
I'm not sure why expected behavior is correct. Because result of
JSON.dump
is not changed from old version.$ ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })" 2.6.3 {"a":1,"a":2} $ ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a' => 1, :a => 2 })" 2.7.1 {"a":1,"a":2}
Please let me know if my understanding is wrong.
@hsbt , if the colon (:) syntax is used to create symbols as keys, the output matches the expected result:
ruby -e "gem 'json', '2.6.3'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.6.3
{"a":2}
ruby -e "gem 'json', '2.7.1'; require 'json'; puts JSON::VERSION, JSON.dump({ 'a': 1, a: 2 })"
-e:1: warning: key :a is duplicated and overwritten on line 1
2.7.1
{"a":2}
However, when the rocket (=>) syntax is used to associate keys with values in the hash, the JSON output contains duplicate values.
Therefore, the output should be consistent regardless of the syntax used to create the hash.
@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.
However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.
I'm not sure why expected behavior is correct. Because result of
JSON.dump
is not changed from old version.
@hsbt he's not trying to fix a regression, but do prevent duplicated keys in the output. Arguably it's a minor bug, but it's true that it's not really expected for a JSON library to allow this.
As you know JSON has multiple standards, most don't say anything about that, but RFC 8259 says keys SHOULD be unique.
And more generally it's unclear how parsers will handle it, some may keep the first value, some the last, etc.
@maniSHarma7575 it's not a "syntax" difference. One is strings the other is symbols.
However your PR is nowhere near enough, this repo contains 3 alternative implementations, one in Ruby, one in C, one in Java. All 3 would need to be changed.
@byroot , it's functioning well for all three alternative implementations, as I've implemented the fix in lib/json/common.rb
.
https://github.com/flori/json/blob/78643245d2b60f770946e2e252890655d61737fc/lib/json/common.rb#L614
For all the 3 implementations I have tested it:
- For Java
ruby -v
jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f Java HotSpot(TM) 64-Bit Server VM 25.391-b13 on 1.8.0_391-b13 +jit [arm64-darwin]
rake build
gem build json-java.gemspec
Successfully built RubyGem
Name: json
Version: 2.7.2
File: json-2.7.2-java.gem
mkdir -p pkg
mv json-2.7.2-java.gem pkg
Created a script json.rb
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end
require 'json'
pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"
- For C
ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
rake build
Resolving dependencies...
(in /Users/manishsharma/Development/personal/opensource/json)
Writing version information for 2.7.2
gem build json.gemspec
Successfully built RubyGem
Name: json
Version: 2.7.2
File: json-2.7.2.gem
gem build json_pure.gemspec
Successfully built RubyGem
Name: json_pure
Version: 2.7.2
File: json_pure-2.7.2.gem
mkdir -p pkg
mv json-2.7.2.gem pkg
mv json_pure-2.7.2.gem pkg
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "json", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end
require 'json'
pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"
- For Ruby
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "json_pure", "2.7.2", path: "/Users/manishsharma/Development/personal/opensource/json"
end
require 'json/pure'
pp JSON.dump({ 'a' => 1, :a => 2 })
ruby json.rb
Resolving dependencies...
"{\"a\":2}"
@hsbt and @byroot Waiting for your response.
Hi @hsbt Could you please review this?
@byroot Waiting for the response on this?
cc: @hsbt
I don't think JSON
should perform this validation. It will hurt performance a lot and it's not something we can fully guarantee to users.
This may cover simple cases but not, for example, differently encoded strings:
> JSON.dump({"foo".encode("UTF-16").to_s => 1, "foo".to_s => 2})
=> "{\"foo\":1,\"foo\":2}"
Covering that would have to be quite difficult. I think it's best we leave the responsibility to the user to provide reasonable input.
(As I wrote in the Rails issue) I'd actually prefer { 'a' => 1, :a => 2 }
be turned into the questionably valid ("SHOULD" is not quite as strong as "MUST") JSON as "{\"a\":1,\"a\":2}"
rather than either "{\"a\":1}"
or "{\"a\":2}"
as it would allow me to see that there is an issue with my program rather than hiding the error and making an arbitrary choice.