Problem matching word boundaries (\b) at the end of a string
Describe the bug
When using regex matching, \b doesn't seem to match word boundaries at the end of a matched string.
Reproducing See example 5 here: https://bit.ly/4hpMYXb
Example 6 demonstrates that word-boundaries at the beginning of a string match just fine.
Removing the second word-boundary pattern fixes example 5, but breaks example 6: https://bit.ly/41OGBqG
Expected behavior I've tried a few different engines, and they all seem to respect word-boundaries which are also end-of-strings. For example, Ruby:
"abc 123 xyz".scan(/\b\w+\b/) #=> ["abc", "123", "xyz"]
Also, not sure if this is analogous, but given a file with no trailing newline:
IO.read("regex-test.txt") #=> "abc 123 xyz"
rg gives this output:
rg '\b\w+\b' regex-test.txt -o
1:abc
1:123
1:xyz
Hi @nsmmrs
:[~\b]:[arg]:[~\b] is not a regular expression, so it will not behave as if you had a regular expression like \b\w+\b.
You might have more mileage using :[arg~\b\w+\b] so that comby understands you want to match a regular expression \b\w+\b.
The pattern :[~\b]:[arg]:[~\b] means something different, because :[arg] will keep matching (according to comby's rules) and then after matching what it wants to match (it doesn't respect word boundaries), will then try match \b. I.e., comby doesn't "look ahead" and behave as if it should stop matching :[arg] at a word boundary.
Hi @rvantonder,
Thanks for the explanation. It makes sense why my method shouldn't be expected to work, but maybe there is a bug somewhere else, or maybe what I'm trying to do isn't actually supported yet.
When I originally tried the method you've suggested, I assumed I was escaping the regex incorrectly, or misunderstanding the docs, because it doesn't seem to work (at least in the rewrite rules context).
In the example you've linked to, you'll notice that examples 1, 2, and 6 are broken, since it's replacing all instances of arg, regardless of word boundaries. As far as I can tell, the regex in this case is being completely ignored, because it gives the same output as no regex and an incorrect regex.
I have to admit I didn't pay close attention to the output because I don't follow what the goal is of your pattern. Can you explain what you would like/expect for 1,2,6?
My sense at this point though is that the intent won't work: when using a variable in rewrite rules it will just match and rewrite that, and it's not really that you can say 'match col or t or whatever I matched before and then only rewrite it at a word boundary and not col in columns'. There needs to be a consistent way of rewriting when 'arg' is referenced in the rule, and that consistent way is 'match exactly without word boundaries'. Which admittedly is not helpful in your case, but helps explain why the behavior to ignore regex is happening and would be involved to change.
Given you have an idea of what you expect, there is probably a different way of match/rewrite rules to achieve what you're going for.
Sorry, I should have put that in the context section of the issue template.
Ruby 3.4 introduced a shorthand for block arguments:
# Before:
paths.each{|path| puts path }
# After:
paths.each{ puts it }
There is probably a Rubocop rule for this, but I'm just using it as a way to get the hang of comby.
I need to find all blocks which have only one argument, and rewrite the body so that there are no arguments declared and that every occurrence of the old argument name is replaced with it.
The best result I've gotten so far is by wrapping the body with extra symbols before doing the substitution, and removing them afterward. This gets around the word boundary issue in my original post.
I still need to fix another issue, which is that the substitution should not be applied to blocks where a nested block refers to the top level block argument.
For example:
rows.each { |row|
row.columns.each { |column|
puts [row, column]
}
}
My current rule produces this:
rows.each {
it.columns.each { |column|
puts [it, column] # <- "it" is not allowed in blocks with arguments declared
}
}
This seems like it should be an easy fix with pattern match expressions, but I just haven't worked it out yet.