Fix comment directive parsing problem

Problem of comment parsing

The main problem is that @preprocess.handle parses comment, removes directive, and process code_object at the same time. This pull request change RDoc to parse comment and extract directives first, and then apply directives to code object.

Flow of legacy RDoc parsing method

For example parsing this code

class A
  # :yields: x, y
  # :args:   a, b
  # :call-seq: 
  #--
  # :not-new:
  # :category: foobar
  #++
  #   initialize(x, y, z)
  def initialize(*args, &block); end
end

Step 1

RDoc performs @preprocess.hanlde to RDoc::NormalClass.

:category: is applied to klass and replaced with blank line
:not-new: and :yields: are replaced with blank line. maybe bug.
:args: a, b is replaced with :args: a, b

Step 2

RDoc performs @preprocess.hanlde to RDoc::AnyMethod. :args: a, b is applied to meth.params.

Step 3

RDoc removes private section that starts with #-- and ends with #++.

Step 4

RDoc normalizes comment by removing # and indentation.

Step 5

RDoc extracts ":call-seq:\n initialize(x, y, z) from comment and apply to method object.

Problems

RDoc removes directives and expand :include: twice in some case, and once in other case. To avoid all directives removed in the first @preprocess.handle, preprocess needs directive-replace mechanizm which is making things complex.

Private section and call-seq are processed later. This is making RDoc accept weird comment like directive inside private section and private section inside call-seq.

Handling meta programming method is also hard. @preprocess.handle(comment, code_object) requires code object already created. We need to parse the comment to know the code object type (method or attribute). After that, we can finally parse the comment with the code object.

C comments are also complicated. :include: can include text containing */. Removing directive line and private section from the comment might remove /* and */ which makes normalize_comment fail. The original implementation was avoiding this by using different processing order than ruby parser. This is not consistent.

Solution

We need to parse comment first and only once to extract directives. Expand :include:, read directives (including :call-seq:), remove private section at the same time. Comment parser should return normalized comment text and directives as an attribute hash. Directive should also contain line number.

Changed things

:call-seq:

New type of directive called "multiline directive" is introduced to make :call-seq: also a directive.

# :multiline-directive:
#   html
#     head
#       title
#
#     body
#       header
#       footer

Multiline directive ends with blank line. This restriction is for compatibility with old RDoc. Some invalid multiline directive (unindented, ends with other directive) is also accepted with warning.

The resuld of parsing this call-seq is changed. I think it get better.

# :call-seq:
#   STDIN.getc()     -> string # Only this line was call-seq
#
#   STDIN.getc(a)    -> string
#
#   STDIN.getc(a, b) -> string
#   $stdin.getc(c)   -> string # It's now call-seq until this line
#
# :other:

Private section

#----foobar was accepted as private section start. #++++foobar was decomposed to #++(private end) and ++foobar(normal comment). Start is now /^#-{2,}$/ (two or more -), end is now /^#\+{2}$/ (exactly two +).

Unhandled directives

In old RDoc, unhandled directive # :unknown: foo remain in normal comment. Now it is removed just like other directives. Unhandled directive is appended to code object's metadata. It does not make sence to leave metadata in the comment. I think this was just a side effect of avoiding double parsing problem.

Normalize and remove private section

Everything is done in parse phase

C and Simple parser

C used to accept /*\n# :directive:\n*/ but now only accepts * :directive:. Changes for call-seq, private section and unhandled directive described above are also applied to C and Simple parser.

Old comment parsing

RDoc::Markup::PreProcess#handle RDoc::Comment#extract_call_seq RDoc::Comment#remove_private is only used from RDoc::Parser::Ruby. We can remove them in the future.

Diff

I compared generated html files of rdoc itself and c files in ruby/ruby/*.c. Diff are in File/Stat.html, Thread.html and RDoc/Parser/Ruby.html which I think acceptable.

File/Stat.html

Parsing this call-seq: is improved

ruby/ruby/file.c:5671

/*
 * call-seq:
 *
 *   File::Stat.new(file_name)  -> stat
 *
 * Create a File::Stat object for the given file name (raising an
 * exception if the file doesn't exist).
 */

Thread.html

Parsing this call-seq: began to fail but I think indentation of this comment is wrong.

ruby/ruby/thread.c:875

/*
 * call-seq:
 *  Thread.new { ... }			-> thread
 *  Thread.new(*args, &proc)		-> thread
 *  Thread.new(*args) { |args| ... }	-> thread
 *
 *  Creates a new thread executing the given block.
 *  ...
 */

RDoc/Parser/Ruby.html

lib/rdoc/parser/ruby.rb:122

##
# You can define arguments for metaprogrammed methods via either the
# :call-seq:, :arg: or :args: directives.

This is a metaprogramming comment that stars with ## and contains :call-seq: directive. It should be escaped like

# \:call-seq:, ...

Aug 04 '24 19:08 tompng

Thanks for the PR! But because 1) the new maintainers are still learning the codebase, and 2) this PR touches a core part of RDoc, I'll hold off merging it until version 6.8.0 series is released 🙏

Aug 23 '24 21:08 st0012

🚀 Preview deployment available at: https://6930a42f.rdoc-6cd.pages.dev (commit: 50afbbb983378686addaea984787370d89c9ecdf)

Nov 07 '25 04:11 matzbot

Change comment directive parsing

Problem of comment parsing

Flow of legacy RDoc parsing method

Step 1

Step 2

Step 3

Step 4

Step 5

Problems

Solution

Changed things

:call-seq:

Private section

Unhandled directives

Normalize and remove private section

C and Simple parser

Old comment parsing

Diff

File/Stat.html

Thread.html

RDoc/Parser/Ruby.html