json-schema icon indicating copy to clipboard operation
json-schema copied to clipboard

Fix caching issues in JSON::Util::URI

Open bolshakov opened this issue 1 year ago • 7 comments
trafficstars

This PR fixes the issue #514

The implementation of the whole JSON::Util::URI was not thread-safe. For instance, if you look at the .parse method.

def self.parse(uri)
  if uri.is_a?(Addressable::URI) 
    uri.dup
  else
    @parse_cache ||= {}
    parsed_uri = @parse_cache[uri]
    if parsed_uri
      parsed_uri.dup
    else
      @parse_cache[uri] = Addressable::URI.parse(uri)
    end
  end
rescue Addressable::URI::InvalidURIError => e
  raise JSON::Schema::UriError, e.message
end

In line #8, the URL is retrieved from the cache, and the caller receives a duplicate of the cached URL. However, in line #10, the caller does not receive a duplicate, and in a few instances, it changes the returned reference, effectively modifying it in the cache.

def self.normalize_ref(ref, base)
  ref_uri = parse(ref)
  base_uri = parse(base)

  ref_uri.defer_validation do
    if ref_uri.relative?
      ref_uri.merge!(base_uri)

      # Check for absolute path
      path, fragment = ref.to_s.split('#')
      if path.nil? || path == ''
        ref_uri.path = base_uri.path
      elsif path[0, 1] == '/'
        ref_uri.path = Pathname.new(path).cleanpath.to_s
      else
        ref_uri.join!(path)
      end

      ref_uri.fragment = fragment
    end

    ref_uri.fragment = '' if ref_uri.fragment.nil? || ref_uri.fragment.empty?
  end

  ref_uri
end

As you can see, if the referred URI is not in the cache, it is added to the cache and then modified inside the .normalize_ref method. Basically, we pass an object by reference between methods and modify it. This code is not thread-safe because it accesses and modifies a shared resource, @parse_cache, without any form of synchronization.

Other methods that call .parse method to resolve references also implement a similar non-synchronized caching approach. In a multi-threaded environment, this could lead to a race condition and potential incorrect resolution of references in JSON Schemas.

Initially, I wanted to add Mutex synchronization or use thread-safe data structures (such as Corcurrent::Map for cache), but it appeared that the code is deeply intertwined, with many methods from the module calling each other, making synchronization prone to deadlocks.

I have removed caching from this module entirely because it is not thread-safe and causes more problems than it solves. The improvements it offers are minimal since we don't cache expensive network or disk calls but only in-memory calculations.

I also fixed a couple of bugs in method implementations along the way:

  • Issue with the URI.absolutize_uri method. The URI.absolutize_ref implementation does not work as described in the tests. The tests pass only because of caching that persists state between subsequent calls in the test example. See this PR showing how to reproduce the issue: https://github.com/voxpupuli/json-schema/pull/516.
  • Issue with the URI.normalized_ref method which can persist state between calls leading to incorrect reference normalisation. This PR https://github.com/voxpupuli/json-schema/pull/517 reproduces the issue.

I tried to keep, changes in the tests suite at minimum, but ended up changing this single test which behaviour was not consistent with other test examples..

bolshakov avatar Jul 05 '24 16:07 bolshakov

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.03%. Comparing base (4aed27d) to head (df39511). Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #515      +/-   ##
==========================================
- Coverage   90.08%   90.03%   -0.06%     
==========================================
  Files          76       76              
  Lines        1584     1575       -9     
==========================================
- Hits         1427     1418       -9     
  Misses        157      157              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 06 '24 16:07 codecov[bot]

Hey, @bastelfreak 👋 The PR Is ready for review. I would appreciate your feedback

bolshakov avatar Jul 08 '24 12:07 bolshakov

@ekohl, @TheMeier, @evgeni folks, may you also help with the review?

bolshakov avatar Jul 09 '24 11:07 bolshakov

I have no idea about this gem, sorry.

evgeni avatar Jul 09 '24 11:07 evgeni

My apologies. I pinged you because you reviewed one of the recent PRs https://github.com/voxpupuli/json-schema/pull/509

bolshakov avatar Jul 09 '24 12:07 bolshakov

@bolshakov I think we're ready for a merge. Can you please rebase this and if the updated rubocop config is happy, I can merge it.

bastelfreak avatar Jul 12 '24 12:07 bastelfreak

@bastelfreak ready

bolshakov avatar Jul 15 '24 08:07 bolshakov

Is there a timeline on when this will be merged?

mjknight50 avatar Aug 15 '24 16:08 mjknight50

Hi, I am sorry for the huge delay. Thanks for all the work and thanks for pinging me again!

bastelfreak avatar Aug 19 '24 09:08 bastelfreak