Publish a textmate gem
I'm creating this as a checklist for publishing the gem
- [x] retagging feature
- [x] create a
wordCannotBefeature for patterns - [x] create a standard import pattern/export pattern method
- [x] add a warning for when a repo name is being overwritten
- [x] new pattern class errors when provided a regex that contains a capturing group
- [x] add a check for newlines, issue a warning
- [x] create a Pattern class instead of having newPattern use regex
- [x] add a
to_smethod to Pattern and PatternRange - [x] remove the "smart" include behavior (
.to_tagreturning{ include: "#repo_name" }) - [x] remove the
@wrap_source - [ ] rename "while" to something more representative of its behavior
- [x] remove the yaml export, only export as json
- [x] create an "save_to" which exports the language tags, and json syntax
- [ ] switch to snake_case for everything
- [ ] add a bailout feature
- [x] check/fix the
tag_as:in oneOrMoreOf() or zeroOrMoreOf() - create a standard library of patterns
- [x] numerics
- [ ] block and parentheses finder
- [ ] strings and escapes
- [ ] inline comments
- [ ] standard space
- add documentation
- [x] The Grammar class
- [x] init arguments
- $initial_context
- setting repos
- exporting
- the named patterns (@spaces, @word)
- [ ] The Token class
- That
- using
not()
- using
- RepresentationsThat
- That
- [x] The Pattern class
- [x] init arguments
- [?] backreferences
- [?] $match
- [?] $reference
- [x] all of the helper methods (then, or, zeroOrMoreOf, oneOrMoreOf, etc)
- [x] init arguments
- [x] Pattern Range
- [x] init arguments
- [ ] The standard library
- [x] The Grammar class
- add documentation to gitbook
We may want to reevaluate generating *.tmLanguage.yaml files. With, afaik, no text editor or tool supporting yaml language, and with the JSON files being pretty-printed, I not sure what their purpose is.
yeah. I think its safe to remove them, I only added yaml because I didn't pretty print the json.
Fairly minor concern but
create a Pattern class instead of having newPattern use regex
would prevent /foo/.maybe(bar) from working, correct?
If so, there are 33 instances of that that would need to be replaced.
Yeah, and that was the original reason that the pattern class was never created. If I made it a class the idea would be to clean everything up and let it be symbolic, and then only convert it to regex once to_tag was called. But I'm not sure it's worth all the trouble. It'd require a lot of library re writing just to make it a bit more straightforward under the hood
I've also been debating the snake case switch. I prefer static data and like-static-data to be snake case and functions to be camel, but Ruby uses snake for both. If I switched it would only be for formatting consistency.
It seems the majority of the uses for starting with a regexp is looking for a literal (17), with anchors being the nxt largest (6).
The warning for newlines is probably too overly broad to be useful. In particular, the new include pattern trips up my initial warning.
Output:
There is a pattern that likely tries to match characters after \n
textmate grammars only operate on a single line, \n is the last possible character that can be matched.
Here is the pattern:
((?:^)((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))((#)\s*((?:(?:include|include_next)|import))\b)\s*(?:(?:(?:((<)[^>\n]*(>?)((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/)))|((\")[^\"\n]*(\"?)((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/))))|((?:[a-zA-Z_]|(?:\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}))(?:[a-zA-Z0-9_]|(?:\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}))*((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/))))|((?:(?:(?>\s+)|(\/\*)((?>(?:[^\*]|(?>\*+)[^\/])*)((?>\*+)\/)))+?|(?:(?:(?:(?:\b|(?<=\W))|(?=\W))|\A)|\Z)))(?:(?:\n|$)|(?=\/\/))))
Newline detection:
# check for matching after \n
if /\\n(.*?)(?=!\||\\n)/ =~ regex_as_string
if /[^\^$\[\]\(\)?:+*=!<>\\]/ =~ $1
puts "\n\nThere is a pattern that likely tries to match characters after \\n\n"
puts "textmate grammars only operate on a single line, \\n is the last possible character that can be matched.\n"
puts "Here is the pattern:\n"
puts regex_as_string
# raise "Error: see printout above"
end
end
Does
check/fix the tag_as: in oneOrMoreOf() or zeroOrMoreOf()
Refer to the fact that the pattern below will only give the last foo the scope of test.foo? If so, I think adding a warning about that would harm usability, composing patterns should be able to work without having to call turn_of_numbered_capture_groups whenever you want to match the pattern multiple times.
grammar[:test] = newPattern(
match: oneOrMoreOf(newPattern(match:/foo/, tag_as:"test.foo").maybe(@spaces)),
tag_as: "test"
)
Does
check/fix the tag_as: in oneOrMoreOf() or zeroOrMoreOf()
Refer to the fact that the pattern below will only give the last
foothe scope oftest.foo? If so, I think adding a warning about that would harm usability, composing patterns should be able to work without having to call turn_of_numbered_capture_groups whenever you want to match the pattern multiple times.grammar[:test] = newPattern( match: oneOrMoreOf(newPattern(match:/foo/, tag_as:"test.foo").maybe(@spaces)), tag_as: "test" )
Yeah that was the problem I was referring to. I was thinking of automatically disabling the tags that are inside the oneOrMoreOf() or zeroOrMoreOf() and then automatically adding an includes: that has the repeated pattern one time, which I believe will then correctly tag all of the repetitions.
\\n(.*?)(?=!\||\\n)
I was thinking it would be a warning just like the zero-length match warning. Its given, but it can be disabled because sometimes it's valid.
That said I'm not sure why there's a [^>\n] inside the pattern. There's other \n's too but [^>\n] should just be [^>]
Fairly minor concern but
create a Pattern class instead of having newPattern use regex
would prevent
/foo/.maybe(bar)from working, correct?If so, there are 33 instances of that that would need to be replaced.
We could just do
class Pattern < Regexp
def initialize(arguments)
self.then(arguments)
end
end
That we could use Pattern.new in the guides/explanations to be consistent with PatternRange.new()
I think theres no way to do the use the turnOffNumberedCaptureGroups on every regex pattern without having a refactored Pattern class. Right now there's no way to fully tell the difference between pure regex and pattern-combined regex.
The next closest thing would be to count the number of capture groups there should be, and the number of capture groups their actually are.
The to_s method (for printing out a pattern the same way it looked in the code) can't be done without a refactored Pattern class either.
That we could use Pattern.new in the guides/explanations to be consistent with PatternRange.new()
That sounds good.
I was thinking it would be a warning just like the zero-length match warning. Its given, but it can be disabled because sometimes it's valid.
That said I'm not sure why there's a [^>\n] inside the pattern. There's other \n's too but [^>\n] should just be [^>]
I'll clean up the include pattern, add the ability to disable the warning and push that branch.
Another check that might be worth considering is wrong ordering of patterns with a common prefix.
If a pattern set contains the following patterns:
...
[A-Z_]+ # pattern 1
[A-Z_]+\s+foo # pattern 2
...
Pattern 2 will never match because it has a common prefix with pattern 1 and pattern 1 is first.
Another check that might be worth considering is wrong ordering of patterns with a common prefix.
If a pattern set contains the following patterns:
... [A-Z_]+ # pattern 1 [A-Z_]+\s+foo # pattern 2 ...Pattern 2 will never match because it has a common prefix with pattern 1 and pattern 1 is first.
Do you think this is detectable? I guess check for a perfect subset at the beginning of two patterns.
Yeah,
for any pattern A of the pattern set 2..N:
for each pattern B in the range 1..A-1:
if B is a prefix of A, issue a warning/error
It would need some handling for extra parens, but even without that, it should catch the common case of [patternA, patternA.then(patternB)].
the zeroLengthStart is a bit over-cautious, in particular, \G is usually intentional, and as such \G should not be enough to trigger the warning.
One possible solution is to do what vscode-textmate does when attempting to not evaluate \G, that is, replace \G with \uFFFF.
Depending on whether you plan on supporting generating atom themes, you may want to split tag_as by the space character and generate multiple captures. See https://github.com/microsoft/vscode-textmate/issues/108#issuecomment-522197256
It's somewhat of an odd case, but should the grammar generator detect when you are using a start of line anchor inside a sub match, and then rewrite it so that it works?.
Example: Orignal (non working):
Pattern.new(
match: /\W+/,
includes: [
Pattern.new(match: /^bar/, tag_as: "bar")
]
)
Rewritten (working)
Pattern.new(
match: /\W+/,
includes: [
PatternRange.new(
start_pattern: lookAheadFor(/./),
end_pattern: /$/,
includes: [
Pattern.new(match: /\Gbar/, tag_as: "bar")
]
)
]
)
It is fairly unintuitive that /^/ doesn't work inside sub captures. And equally unituitive that the solution is to wrap it up in a PatteernRange.
Depending on whether you plan on supporting generating atom themes, you may want to split tag_as by the space character and generate multiple captures. See microsoft/vscode-textmate#108 (comment)
I think this is a perfect situation for the library to handle (to support Atom), but it is low priority so we should put the effort in after the gem is published.
It's somewhat of an odd case, but should the grammar generator detect when you are using a start of line anchor inside a sub match, and then rewrite it so that it works?.
Example: Orignal (non working):
Pattern.new( match: /\W+/, includes: [ Pattern.new(match: /^bar/, tag_as: "bar") ] )Rewritten (working)
Pattern.new( match: /\W+/, includes: [ PatternRange.new( start_pattern: lookAheadFor(/./), end_pattern: /$/, includes: [ Pattern.new(match: /\Gbar/, tag_as: "bar") ] ) ] )It is fairly unintuitive that /^/ doesn't work inside sub captures. And equally unituitive that the solution is to wrap it up in a PatteernRange.
Yeah I think this is another perfect case for the library to handle. Like with the Atom support though, handle it after the gem is published
Since quite a few check items and potential additions perform transformations on the patterns, there should be some way of enabling logging whenever a transformation occurs.
I think I'm going to try to publish this as a pre-alpha gem sometime this week. Its the first week of the semester and I want to try to re-structure this repo (get each language in a separate repo, and coordinate the change with Alexr00) before the semester gets into full swing, even though the checklist is far from done
The only thing I want to change before that is the import/export method. I think I'm going to change it to something like
require 'textmate_tools'
grammar = Grammer.get_exportable_grammar
grammar.exports = [
:repo_name_1,
:repo_name_2,
]
grammar.external_repos = [
:std_space,
:variable,
]
grammar[:repo_1] = Pattern.new()
The .exports are the names of the repos that should be exported
The .external_repos are the ones you expect to already be defined by something else
Every time that grammar sees a repo name (even in the includes of a pattern), if its not in the .exports and not in the .external_repos, it will create a mapping from that repo name to a randomly generated number so that the name never conflicts with a name from another package.
That sounds pretty nice. I have two concerns:
- The mapping should probably be stateless;
- Rerunning
npm run buildshould not change the generated grammar(builds should be reproducible)
so I'm thinking:
- Grammar#get_exportable_grammar returns a new ExportGrammar that contains a reproducible salt (hash of the relative filename, some "export name", etc")
- whenever the ExportGrammar encounters a name that is not in exports or external_repos, the name is prepended with the salt
Yeah salt based on file name/path would be ideal
With a unified export, to another grammar or to a file, how does an import look like?
I was actually thinking they would be different (exporting a grammar to JSON vs exporting patterns)
Ok so: Grammar::import(path) imports a json file to a grammar Grammar#save(path) saves a grammar to a json file Grammar#import(path_or_grammar_object) merge another grammar into this one Grammar::get_exportable_grammar returns a new ExportableGrammar
Is importing a grammar from a path a useful feature? As it requires either converting a tag back into a pattern, or allow the grammar to store patterns as a tag.
These were the only ones I was planning on having
a_grammar.import(path)
a_grammar.save_to(tm_language_path)
Grammar.get_exportable_grammar()
I think importing from a file is ideal because it prevents namespace pollution and guessing.
For example with with Doxygen, I didn't know what variables/functions it contributed. I had to open up the file and actually read the whole thing to realize it was really just exporting the doxygen function. When external files contribute common names like line_comment, they run the risk of accidentally overriding or accidentally getting overridden by something in generate.rb.