better-cpp-syntax icon indicating copy to clipboard operation
better-cpp-syntax copied to clipboard

Publish a textmate gem

Open jeff-hykin opened this issue 6 years ago • 63 comments

I'm creating this as a checklist for publishing the gem

  • [x] retagging feature
  • [x] create a wordCannotBe feature for patterns
  • [x] create a standard import pattern/export pattern method
  • [x] add a warning for when a repo name is being overwritten
  • [x] new pattern class errors when provided a regex that contains a capturing group
  • [x] add a check for newlines, issue a warning
  • [x] create a Pattern class instead of having newPattern use regex
  • [x] add a to_s method to Pattern and PatternRange
  • [x] remove the "smart" include behavior (.to_tag returning { include: "#repo_name" } )
  • [x] remove the @wrap_source
  • [ ] rename "while" to something more representative of its behavior
  • [x] remove the yaml export, only export as json
  • [x] create an "save_to" which exports the language tags, and json syntax
  • [ ] switch to snake_case for everything
  • [ ] add a bailout feature
  • [x] check/fix the tag_as: in oneOrMoreOf() or zeroOrMoreOf()
  • create a standard library of patterns
    • [x] numerics
    • [ ] block and parentheses finder
    • [ ] strings and escapes
    • [ ] inline comments
    • [ ] standard space
  • add documentation
    • [x] The Grammar class
      • [x] init arguments
      • $initial_context
      • setting repos
      • exporting
      • the named patterns (@spaces, @word)
    • [ ] The Token class
      • That
        • using not()
      • RepresentationsThat
    • [x] The Pattern class
      • [x] init arguments
        • [?] backreferences
        • [?] $match
        • [?] $reference
      • [x] all of the helper methods (then, or, zeroOrMoreOf, oneOrMoreOf, etc)
    • [x] Pattern Range
      • [x] init arguments
    • [ ] The standard library
  • add documentation to gitbook

jeff-hykin avatar Jul 08 '19 07:07 jeff-hykin

We may want to reevaluate generating *.tmLanguage.yaml files. With, afaik, no text editor or tool supporting yaml language, and with the JSON files being pretty-printed, I not sure what their purpose is.

matter123 avatar Jul 19 '19 01:07 matter123

yeah. I think its safe to remove them, I only added yaml because I didn't pretty print the json.

jeff-hykin avatar Jul 25 '19 15:07 jeff-hykin

Fairly minor concern but

create a Pattern class instead of having newPattern use regex

would prevent /foo/.maybe(bar) from working, correct?

If so, there are 33 instances of that that would need to be replaced.

matter123 avatar Jul 25 '19 19:07 matter123

Yeah, and that was the original reason that the pattern class was never created. If I made it a class the idea would be to clean everything up and let it be symbolic, and then only convert it to regex once to_tag was called. But I'm not sure it's worth all the trouble. It'd require a lot of library re writing just to make it a bit more straightforward under the hood

jeff-hykin avatar Jul 25 '19 19:07 jeff-hykin

I've also been debating the snake case switch. I prefer static data and like-static-data to be snake case and functions to be camel, but Ruby uses snake for both. If I switched it would only be for formatting consistency.

jeff-hykin avatar Jul 25 '19 19:07 jeff-hykin

It seems the majority of the uses for starting with a regexp is looking for a literal (17), with anchors being the nxt largest (6).

matter123 avatar Jul 25 '19 20:07 matter123

The warning for newlines is probably too overly broad to be useful. In particular, the new include pattern trips up my initial warning.

Output:

There is a pattern that likely tries to match characters after \n
textmate grammars only operate on a single line, \n is the last possible character that can be matched.
Here is the pattern:
((?:^)((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))((#)\s*((?:(?:include|include_next)|import))\b)\s*(?:(?:(?:((<)[^>\n]*(>?)((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/)))|((\")[^\"\n]*(\"?)((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/))))|((?:[a-zA-Z_]|(?:\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}))(?:[a-zA-Z0-9_]|(?:\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}))*((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/))))|((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/))))

Newline detection:

# check for matching after \n
if /\\n(.*?)(?=!\||\\n)/ =~ regex_as_string
    if /[^\^$\[\]\(\)?:+*=!<>\\]/ =~ $1
        
        puts "\n\nThere is a pattern that likely tries to match characters after \\n\n"
        puts "textmate grammars only operate on a single line, \\n is the last possible character that can be matched.\n"
        puts "Here is the pattern:\n"
        puts regex_as_string
        # raise "Error: see printout above"
    end
end

matter123 avatar Jul 26 '19 01:07 matter123

Does

check/fix the tag_as: in oneOrMoreOf() or zeroOrMoreOf()

Refer to the fact that the pattern below will only give the last foo the scope of test.foo? If so, I think adding a warning about that would harm usability, composing patterns should be able to work without having to call turn_of_numbered_capture_groups whenever you want to match the pattern multiple times.

grammar[:test] = newPattern(
    match: oneOrMoreOf(newPattern(match:/foo/, tag_as:"test.foo").maybe(@spaces)),
    tag_as: "test"
)

matter123 avatar Jul 26 '19 17:07 matter123

Does

check/fix the tag_as: in oneOrMoreOf() or zeroOrMoreOf()

Refer to the fact that the pattern below will only give the last foo the scope of test.foo? If so, I think adding a warning about that would harm usability, composing patterns should be able to work without having to call turn_of_numbered_capture_groups whenever you want to match the pattern multiple times.

grammar[:test] = newPattern(
    match: oneOrMoreOf(newPattern(match:/foo/, tag_as:"test.foo").maybe(@spaces)),
    tag_as: "test"
)

Yeah that was the problem I was referring to. I was thinking of automatically disabling the tags that are inside the oneOrMoreOf() or zeroOrMoreOf() and then automatically adding an includes: that has the repeated pattern one time, which I believe will then correctly tag all of the repetitions.

jeff-hykin avatar Jul 27 '19 01:07 jeff-hykin

\\n(.*?)(?=!\||\\n)

I was thinking it would be a warning just like the zero-length match warning. Its given, but it can be disabled because sometimes it's valid.

That said I'm not sure why there's a [^>\n] inside the pattern. There's other \n's too but [^>\n] should just be [^>]

jeff-hykin avatar Jul 27 '19 01:07 jeff-hykin

Fairly minor concern but

create a Pattern class instead of having newPattern use regex

would prevent /foo/.maybe(bar) from working, correct?

If so, there are 33 instances of that that would need to be replaced.

We could just do

class Pattern < Regexp
    def initialize(arguments)
        self.then(arguments)
    end
end

That we could use Pattern.new in the guides/explanations to be consistent with PatternRange.new()

jeff-hykin avatar Jul 27 '19 01:07 jeff-hykin

I think theres no way to do the use the turnOffNumberedCaptureGroups on every regex pattern without having a refactored Pattern class. Right now there's no way to fully tell the difference between pure regex and pattern-combined regex.

The next closest thing would be to count the number of capture groups there should be, and the number of capture groups their actually are.

The to_s method (for printing out a pattern the same way it looked in the code) can't be done without a refactored Pattern class either.

jeff-hykin avatar Jul 27 '19 01:07 jeff-hykin

That we could use Pattern.new in the guides/explanations to be consistent with PatternRange.new()

That sounds good.

I was thinking it would be a warning just like the zero-length match warning. Its given, but it can be disabled because sometimes it's valid.

That said I'm not sure why there's a [^>\n] inside the pattern. There's other \n's too but [^>\n] should just be [^>]

I'll clean up the include pattern, add the ability to disable the warning and push that branch.

matter123 avatar Jul 27 '19 01:07 matter123

Another check that might be worth considering is wrong ordering of patterns with a common prefix.

If a pattern set contains the following patterns:

...
[A-Z_]+       # pattern 1
[A-Z_]+\s+foo # pattern 2
...

Pattern 2 will never match because it has a common prefix with pattern 1 and pattern 1 is first.

matter123 avatar Aug 03 '19 19:08 matter123

Another check that might be worth considering is wrong ordering of patterns with a common prefix.

If a pattern set contains the following patterns:

...
[A-Z_]+       # pattern 1
[A-Z_]+\s+foo # pattern 2
...

Pattern 2 will never match because it has a common prefix with pattern 1 and pattern 1 is first.

Do you think this is detectable? I guess check for a perfect subset at the beginning of two patterns.

jeff-hykin avatar Aug 05 '19 01:08 jeff-hykin

Yeah,

for any pattern A of the pattern set 2..N:
	for each pattern B in the range 1..A-1:
		if B is a prefix of A, issue a warning/error

It would need some handling for extra parens, but even without that, it should catch the common case of [patternA, patternA.then(patternB)].

matter123 avatar Aug 05 '19 01:08 matter123

the zeroLengthStart is a bit over-cautious, in particular, \G is usually intentional, and as such \G should not be enough to trigger the warning.

One possible solution is to do what vscode-textmate does when attempting to not evaluate \G, that is, replace \G with \uFFFF.

matter123 avatar Aug 17 '19 02:08 matter123

Depending on whether you plan on supporting generating atom themes, you may want to split tag_as by the space character and generate multiple captures. See https://github.com/microsoft/vscode-textmate/issues/108#issuecomment-522197256

matter123 avatar Aug 17 '19 02:08 matter123

It's somewhat of an odd case, but should the grammar generator detect when you are using a start of line anchor inside a sub match, and then rewrite it so that it works?.

Example: Orignal (non working):

Pattern.new(
  match: /\W+/,
  includes: [
    Pattern.new(match: /^bar/, tag_as: "bar")
  ]
)

Rewritten (working)

Pattern.new(
  match: /\W+/,
  includes: [
    PatternRange.new(
      start_pattern: lookAheadFor(/./),
      end_pattern: /$/,
      includes: [
        Pattern.new(match: /\Gbar/, tag_as: "bar")
      ]
    )
  ]
)

It is fairly unintuitive that /^/ doesn't work inside sub captures. And equally unituitive that the solution is to wrap it up in a PatteernRange.

matter123 avatar Aug 17 '19 22:08 matter123

Depending on whether you plan on supporting generating atom themes, you may want to split tag_as by the space character and generate multiple captures. See microsoft/vscode-textmate#108 (comment)

I think this is a perfect situation for the library to handle (to support Atom), but it is low priority so we should put the effort in after the gem is published.

jeff-hykin avatar Aug 22 '19 22:08 jeff-hykin

It's somewhat of an odd case, but should the grammar generator detect when you are using a start of line anchor inside a sub match, and then rewrite it so that it works?.

Example: Orignal (non working):

Pattern.new(
  match: /\W+/,
  includes: [
    Pattern.new(match: /^bar/, tag_as: "bar")
  ]
)

Rewritten (working)

Pattern.new(
  match: /\W+/,
  includes: [
    PatternRange.new(
      start_pattern: lookAheadFor(/./),
      end_pattern: /$/,
      includes: [
        Pattern.new(match: /\Gbar/, tag_as: "bar")
      ]
    )
  ]
)

It is fairly unintuitive that /^/ doesn't work inside sub captures. And equally unituitive that the solution is to wrap it up in a PatteernRange.

Yeah I think this is another perfect case for the library to handle. Like with the Atom support though, handle it after the gem is published

jeff-hykin avatar Aug 22 '19 22:08 jeff-hykin

Since quite a few check items and potential additions perform transformations on the patterns, there should be some way of enabling logging whenever a transformation occurs.

matter123 avatar Aug 22 '19 23:08 matter123

I think I'm going to try to publish this as a pre-alpha gem sometime this week. Its the first week of the semester and I want to try to re-structure this repo (get each language in a separate repo, and coordinate the change with Alexr00) before the semester gets into full swing, even though the checklist is far from done

The only thing I want to change before that is the import/export method. I think I'm going to change it to something like

require 'textmate_tools'

grammar = Grammer.get_exportable_grammar
grammar.exports = [
    :repo_name_1,
    :repo_name_2,
]
grammar.external_repos = [
    :std_space,
    :variable,
]

grammar[:repo_1] = Pattern.new()

The .exports are the names of the repos that should be exported The .external_repos are the ones you expect to already be defined by something else Every time that grammar sees a repo name (even in the includes of a pattern), if its not in the .exports and not in the .external_repos, it will create a mapping from that repo name to a randomly generated number so that the name never conflicts with a name from another package.

jeff-hykin avatar Aug 25 '19 08:08 jeff-hykin

That sounds pretty nice. I have two concerns:

  1. The mapping should probably be stateless;
  2. Rerunning npm run build should not change the generated grammar(builds should be reproducible)

so I'm thinking:

  • Grammar#get_exportable_grammar returns a new ExportGrammar that contains a reproducible salt (hash of the relative filename, some "export name", etc")
  • whenever the ExportGrammar encounters a name that is not in exports or external_repos, the name is prepended with the salt

matter123 avatar Aug 25 '19 16:08 matter123

Yeah salt based on file name/path would be ideal

jeff-hykin avatar Aug 25 '19 17:08 jeff-hykin

With a unified export, to another grammar or to a file, how does an import look like?

matter123 avatar Aug 25 '19 23:08 matter123

I was actually thinking they would be different (exporting a grammar to JSON vs exporting patterns)

jeff-hykin avatar Aug 27 '19 05:08 jeff-hykin

Ok so: Grammar::import(path) imports a json file to a grammar Grammar#save(path) saves a grammar to a json file Grammar#import(path_or_grammar_object) merge another grammar into this one Grammar::get_exportable_grammar returns a new ExportableGrammar

matter123 avatar Aug 27 '19 06:08 matter123

Is importing a grammar from a path a useful feature? As it requires either converting a tag back into a pattern, or allow the grammar to store patterns as a tag.

matter123 avatar Aug 27 '19 20:08 matter123

These were the only ones I was planning on having

a_grammar.import(path)
a_grammar.save_to(tm_language_path)
Grammar.get_exportable_grammar()

I think importing from a file is ideal because it prevents namespace pollution and guessing.

For example with with Doxygen, I didn't know what variables/functions it contributed. I had to open up the file and actually read the whole thing to realize it was really just exporting the doxygen function. When external files contribute common names like line_comment, they run the risk of accidentally overriding or accidentally getting overridden by something in generate.rb.

jeff-hykin avatar Aug 28 '19 16:08 jeff-hykin