json-schema
json-schema copied to clipboard
Fix caching issues in JSON::Util::URI
This PR fixes the issue #514
The implementation of the whole JSON::Util::URI was not thread-safe. For instance, if you look at the .parse method.
def self.parse(uri)
if uri.is_a?(Addressable::URI)
uri.dup
else
@parse_cache ||= {}
parsed_uri = @parse_cache[uri]
if parsed_uri
parsed_uri.dup
else
@parse_cache[uri] = Addressable::URI.parse(uri)
end
end
rescue Addressable::URI::InvalidURIError => e
raise JSON::Schema::UriError, e.message
end
In line #8, the URL is retrieved from the cache, and the caller receives a duplicate of the cached URL. However, in line #10, the caller does not receive a duplicate, and in a few instances, it changes the returned reference, effectively modifying it in the cache.
def self.normalize_ref(ref, base)
ref_uri = parse(ref)
base_uri = parse(base)
ref_uri.defer_validation do
if ref_uri.relative?
ref_uri.merge!(base_uri)
# Check for absolute path
path, fragment = ref.to_s.split('#')
if path.nil? || path == ''
ref_uri.path = base_uri.path
elsif path[0, 1] == '/'
ref_uri.path = Pathname.new(path).cleanpath.to_s
else
ref_uri.join!(path)
end
ref_uri.fragment = fragment
end
ref_uri.fragment = '' if ref_uri.fragment.nil? || ref_uri.fragment.empty?
end
ref_uri
end
As you can see, if the referred URI is not in the cache, it is added to the cache and then modified inside the .normalize_ref method. Basically, we pass an object by reference between methods and modify it. This code is not thread-safe because it accesses and modifies a shared resource, @parse_cache, without any form of synchronization.
Other methods that call .parse method to resolve references also implement a similar non-synchronized caching approach. In a multi-threaded environment, this could lead to a race condition and potential incorrect resolution of references in JSON Schemas.
Initially, I wanted to add Mutex synchronization or use thread-safe data structures (such as Corcurrent::Map for cache), but it appeared that the code is deeply intertwined, with many methods from the module calling each other, making synchronization prone to deadlocks.
I have removed caching from this module entirely because it is not thread-safe and causes more problems than it solves. The improvements it offers are minimal since we don't cache expensive network or disk calls but only in-memory calculations.
I also fixed a couple of bugs in method implementations along the way:
- Issue with the
URI.absolutize_urimethod. TheURI.absolutize_refimplementation does not work as described in the tests. The tests pass only because of caching that persists state between subsequent calls in the test example. See this PR showing how to reproduce the issue: https://github.com/voxpupuli/json-schema/pull/516. - Issue with the
URI.normalized_refmethod which can persist state between calls leading to incorrect reference normalisation. This PR https://github.com/voxpupuli/json-schema/pull/517 reproduces the issue.
I tried to keep, changes in the tests suite at minimum, but ended up changing this single test which behaviour was not consistent with other test examples..
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 90.03%. Comparing base (
4aed27d) to head (df39511). Report is 2 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #515 +/- ##
==========================================
- Coverage 90.08% 90.03% -0.06%
==========================================
Files 76 76
Lines 1584 1575 -9
==========================================
- Hits 1427 1418 -9
Misses 157 157
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hey, @bastelfreak 👋 The PR Is ready for review. I would appreciate your feedback
@ekohl, @TheMeier, @evgeni folks, may you also help with the review?
I have no idea about this gem, sorry.
My apologies. I pinged you because you reviewed one of the recent PRs https://github.com/voxpupuli/json-schema/pull/509
@bolshakov I think we're ready for a merge. Can you please rebase this and if the updated rubocop config is happy, I can merge it.
@bastelfreak ready
Is there a timeline on when this will be merged?
Hi, I am sorry for the huge delay. Thanks for all the work and thanks for pinging me again!