yard icon indicating copy to clipboard operation
yard copied to clipboard

Reprocessing does not reflect removals

Open mensfeld opened this issue 4 years ago • 4 comments

When a comment is removed (not changed), yard used with --use-cache does not update the doc.

Steps to reproduce

  1. git clone https://github.com/mensfeld/broken-yard.git && cd broken-yard
  2. yard --use-cache --verbose
  3. Check docs for the A class in the browser
  4. Open lib/test.rb and remove the comment for the A class
  5. Run yard --use-cache --verbose
  6. See that File 'lib/test.rb' was modified, re-processing...
  7. Refresh the docs page and see, that the comment was not removed.

Actual Output

The comment is not removed.

Expected Output

Comment should be removed.

Environment details:

  • OS: Linux
  • Ruby version (ruby -v): 2.6.5
  • YARD version (yard -v): yard 0.9.20
  • Relevant software dependency/versions:
    • None

I have read the Contributing Guide.

this patch fixes the problem (but 4 specs fail). It includes an empty comment when no comment:

diff --git a/lib/yard/parser/ruby/ruby_parser.rb b/lib/yard/parser/ruby/ruby_parser.rb
index e5b07349..9fb74ea0 100644
--- a/lib/yard/parser/ruby/ruby_parser.rb
+++ b/lib/yard/parser/ruby/ruby_parser.rb
@@ -625,6 +625,8 @@ module YARD
               if comment && !comment.empty?
                 add_comment(line, node)
                 break
+              else
+                add_comment(line, node)
               end
             end
 
@@ -658,7 +660,7 @@ module YARD
         end
 
         def add_comment(line, node = nil, before_node = nil, into = false)
-          comment = @comments[line]
+          comment = @comments[line] || ''
           source_range = @comments_range[line]
           line_range = ((line - comment.count("\n"))..line)
           if node.nil?

mensfeld avatar Oct 23 '19 16:10 mensfeld

IMO the current AST implementation correctly represents the code structure and the proposed patch would actually change the way that commentless blocks are presented.

This is something that could probably be easily fixed at the handler layer by adjusting the heuristic by which YARD ignores missing comment blocks.

There is however a larger issue here, which is that I think the problem statement in this issue is not fully fleshed out. YARD cannot simply remove a docstring because a comment was removed from a file being processed.

YARD's algorithm is as follows:

  1. iterate over each file and parse all code objects:
  2. if a comment is attached to a code object, set this comment as the #docstring replacing the last value.
  3. add line/file info to CodeObjects#files list
  4. repeat from 1

This means that a comment in a file is only considered the object's docstring if it is the last object to be processed in the original list of files. In other words, removing a comment should not always mean the removal of a docstring when using --use-cache.

That's why solving this in the handlers layer is likely more appropriate. You couldn't make the above determination using just AST information. You would likely want to have some logic to do some empty detection inside YARD::Handlers::Base#register_docstring iff the comment is the currently active docstring, i.e., the docstring from the last parsed file. You can use CodeObjects#files to help determine this.

lsegal avatar Oct 30 '19 21:10 lsegal

Thank you for the explanation. WDYT about an approach that invalidates from the registry upon loading all the objects that were built using files that have changes? Something like this (don't mind the styling, etc it's just a POC):

diff --git a/lib/yard/registry_store.rb b/lib/yard/registry_store.rb
index 9e3de1cf..99d980b2 100644
--- a/lib/yard/registry_store.rb
+++ b/lib/yard/registry_store.rb
@@ -257,6 +257,18 @@ module YARD
         load_checksums
         load_root
         load_object_types
+
+        changed_files = []
+        @checksums.each do |file, hash|
+          changed_files << file if Registry.checksum_for(file) != hash
+        end
+
+        @store.delete_if do |k, v|
+          !(v.files.map(&:first) & changed_files).empty?
+        end
+
+        @checksums.delete_if { |c| changed_files.include?(c) }
+
         true
       elsif File.file?(@file) # old format
         load_yardoc_old

It invalidates objects from cache that had anything common with any of the files that have changed while keeping others intact

mensfeld avatar Nov 03 '19 20:11 mensfeld

WDYT about an approach that invalidates from the registry upon loading all the objects that were built using files that have changes?

To this I would generally say: loading the Registry should be a non-mutating operation until Parser (and Handlers) run. The side effect you will run into here is that YARD will delete objects when you call Registry.load if files had changed on disk. A common thing to do with YARD utilities is something like:

YARD::Registry.load # load the current registry
c = YARD::Registry.all.select {|o| o.type == :class } # find classes
do_something_with(c)

Running yard list -c does this, for example. You want to be able to load the Registry in its current serialized state without triggering any type of invalidation. I can think of a number of things that would break if this behavior changed.

TL;DR: the Registry itself is just a storage abstraction and shouldn't perform any type of mutation on its own. The Handlers are what make modifications to the Registry, and the Parser, upon completion, serializes those changes back out to disk.

Hope that helps!

lsegal avatar Nov 03 '19 20:11 lsegal

You're absolutely right. Let me iterate over this next week. Thank you :rocket:

mensfeld avatar Nov 03 '19 22:11 mensfeld