yard icon indicating copy to clipboard operation
yard copied to clipboard

Experiment with getting @!group directive working for C parser.

Open thomthom opened this issue 6 years ago • 12 comments

Description

This is experimenting into allowing @!group directive tags to be picked up by CParser. (#1237) This is a kludgy approach - but I wanted to get the conversation going as I'm not sure exactly where the correct fix should be applied.

From debugging it appear that CParser ignore all comments withing consume_body_statements - this prevents the directives from being activated for the scope of the function body that defines the class/method along with it's methods and constants.

In my experimental exploring I found that I could wrangle it to pick up the directives by capturing any comments starting with @! and assuming they are directive tags. Doing so ensured the group got applied to the objects within YARD::Registry.

The rest of the test suite also passes - but as you can see, I'm really forcing things here. If you have time to provide some insights to how this can be more appropriately addressed I'd appreciate it.

I stepped through both the C parser and Ruby parser comparing their logic - but I struggled to find a common pattern for the parsing that I could apply to backport to the CParser. The main thing I noticed was that the Ruby parser picked up the directive comments and registered them via register_docstring before the constants were processed. That's what this experiment applies.

Completed Tasks

  • [x] I have read the Contributing Guide.
  • [ ] The pull request is complete (implemented / written).
  • [x] Git commits have been cleaned up (squash WIP / revert commits).
  • [x] I wrote tests and ran bundle exec rake locally (if code is attached to PR).

thomthom avatar Apr 09 '19 18:04 thomthom

Coverage Status

Coverage decreased (-0.05%) to 93.693% when pulling b02a7e9f028818b85b2a5287243021e958c879d8 on thomthom:dev-c-directives into 12f56cf7d58e7025085f00b9f9f2f62c24b13d93 on lsegal:master.

coveralls avatar Apr 10 '19 08:04 coveralls

Coverage Status

Coverage decreased (-1.2%) to 92.615% when pulling 7bb63bedc502f25a907a2036bd9e48e6dfd3cf22 on thomthom:dev-c-directives into 890d4ad5f714856a138a5e2aae3b226255e74137 on lsegal:master.

coveralls avatar Apr 10 '19 08:04 coveralls

@lsegal - any chance you are able to shed some light on the differences between the Ruby and the C parser? Something that let me avoid the kludge I currently got?

thomthom avatar Jul 29 '19 17:07 thomthom

@thomthom I think the implementation seems kludgey because adding to the OverrideCommentHandler is mixing multiple concerns together. If you had a separate handler just for directives it would probably look a lot simpler:

class YARD::Handlers::C::DirectiveParserHandler < YARD::Handlers::C::Base
  handles(/./)
  statement_class Comment

  process do
    Docstring.parser.parse(statement.source, namespace, self)
  end
end

I believe this works.

lsegal avatar Jul 29 '19 17:07 lsegal

I had another look at this again. Reverted my hacks, kept the spec. What I'm seeing is that the DirectiveParserHandler is executed after YARD::Handlers::C::ConstantHandler so that the group directive have yet to be attached to the extra_data.group.

I added similar spec to the Ruby parser, trying to see how they behave differently, but so far failing.

Any suggestions to how to ensure the @!group directive is processed before the constant handler? (My latest experiment is here: https://github.com/thomthom/yard/tree/dev-c-directices-handler)

thomthom avatar Sep 09 '19 16:09 thomthom

Can this relate to differences in how Ruby and C is parsed? Ruby appear to traverse via the AST tree, where as C appear to be parsed by regex extraction, which makes things not to be processes in the order they appear in the source?

thomthom avatar Sep 09 '19 16:09 thomthom

Here are my observations:

SourceParser#parse
  ...
  @parser.parse
  post_process

After @parser.parse is done @parser.enumerator (@statements) contains six items; one TopLevelStatement and five Comments. The TopLevelStatement contains four BodyStatement.

@parser.enumerator: (statements)
  YARD::Parser::C::ToplevelStatement
    @block:
      YARD::Parser::C::BodyStatement - rb_cExample = rb_define_class("Example", rb_cObject);
      YARD::Parser::C::BodyStatement - rb_define_const(rb_cExample, "FOOBAR", INT2NUM(1));
      YARD::Parser::C::BodyStatement - rb_define_const(rb_cExample, "FOOBIZ", INT2NUM(2));
      YARD::Parser::C::BodyStatement - rb_define_const(rb_cExample, "HELLO", INT2NUM(3));
  YARD::Parser::C::Comment - @!group Foos
  YARD::Parser::C::Comment - 1: Foobar description.
  YARD::Parser::C::Comment - 2: Foobiz description.
  YARD::Parser::C::Comment - @!endgroup
  YARD::Parser::C::Comment - 3: Hello description.

Even though the comments are in the same scope as the body statements the parser extracts them such that they comments end up in the outer "scope". It looks like the logic is that the comments are bounded in post processing.

But that means the @!group directive isn't being applied correctly.

With my kludge in consume_body_statements where I capture the directive it ends up looking like this:

@parser.enumerator: (statements)
  YARD::Parser::C::ToplevelStatement
    @block:
      YARD::Parser::C::BodyStatement - rb_cExample = rb_define_class("Example", rb_cObject);
      YARD::Parser::C::Comment - @!group Foos
      YARD::Parser::C::BodyStatement - rb_define_const(rb_cExample, "FOOBAR", INT2NUM(1));
      YARD::Parser::C::BodyStatement - rb_define_const(rb_cExample, "FOOBIZ", INT2NUM(2));
      YARD::Parser::C::Comment - @!endgroup
      YARD::Parser::C::BodyStatement - rb_define_const(rb_cExample, "HELLO", INT2NUM(3));
  YARD::Parser::C::Comment - @!group Foos
  YARD::Parser::C::Comment - 1: Foobar description.
  YARD::Parser::C::Comment - 2: Foobiz description.
  YARD::Parser::C::Comment - @!endgroup
  YARD::Parser::C::Comment - 3: Hello description.

Getting the directives processed within the same scope as the body statements appear to be important for this state to be turned on and off properly.

Trying to figure out a way that is less hacky than my proof of concept. But it seems it require some more fundamental change to the C parser. Unless I'm missing some other mechanism I ca use?

thomthom avatar Sep 10 '19 16:09 thomthom

I haven't been able to figure out a solution that avoids copying the @!group back into the ToplevelStatement node. To avoid that the C parser might have to change significantly.

Do you have any further suggestions I can try?

thomthom avatar Oct 25 '19 11:10 thomthom

@lsegal - any chance you have time to consider questions in earlier comment? https://github.com/lsegal/yard/pull/1238#issuecomment-530010441

The current way C source is parsed and information stores doesn't seem to fit with the actual scope of the sources.

thomthom avatar Jun 03 '20 11:06 thomthom

@lsegal It looks like the rename of the primary branch to main has auto-closed this and nine other PRs that were based off of primary branch under the previous name.

flavorjones avatar Jun 20 '20 17:06 flavorjones

Oh good catch. Let's reopen them

lsegal avatar Jun 20 '20 17:06 lsegal

I'm having another look at this. @lsegal do you have a spare moment to look at my earlier question in this comment: https://github.com/lsegal/yard/pull/1238#issuecomment-530010441 ? I'd help me in what direction to attempt next.

thomthom avatar Jan 29 '21 11:01 thomthom