elvish icon indicating copy to clipboard operation
elvish copied to clipboard

Support a `[count:x]` wildcard qualifier

Open xiaq opened this issue 6 years ago • 9 comments

Should work similarly to regular expression's {n}, matching number of codepoints, with support for ranges. Examples:

  • *[count:5] matches 5 codepoints
  • *[count:5-10] matches 5 to 10 codepoints
  • *[count:5-] matches 5 or more codepoints
  • *[count:-5] matches 5 or fewer codepoints

I don't quite like the last example though, as -5 looks like negative five. It is technically unambiguous in this case, but still weird.

xiaq avatar Aug 11 '19 11:08 xiaq

Idea and proposed syntax courtesy of @hanche

xiaq avatar Aug 11 '19 11:08 xiaq

Rather than introducing another, seldom used, modifier I think allowing a modifier to be a lambda is preferable. Yes, I know I have a PR #1018 (issue #1015) under review to introduce a type: modifier :smile: But I expect that to be used at least two orders of magnitude more often than filtering on the length of of the file name. Allowing a lambda lets the user use Elvish to filter the list of pathnames.

krader1961 avatar May 21 '20 03:05 krader1961

This is a local qualifier on how many characters a wildcard matches, not a global modifier on the length of the full filename.

xiaq avatar May 22 '20 13:05 xiaq

This is a local qualifier on how many characters a wildcard matches, not a global modifier on the length of the full filename.

Understood, but it is still a niche feature that can be implemented via a callback. The callback can easily test the length of the full path or just the final component.

krader1961 avatar May 23 '20 02:05 krader1961

Note that it appears that most of these niche wildcard modifiers exist in bash/ksh/zsh are because those shells don't support a callback to decide if the expansion should be accepted or rejected.

krader1961 avatar May 23 '20 02:05 krader1961

If this is not implemented, I think at least I'd prefer a replacement for likely the most common occurrence: [count:1-] meaning “at least one”. So *[set:ab][nonempty] could match one or more a or b, while ?[set:ab][nonempty] could match exactly one of those.

hanche avatar May 23 '20 06:05 hanche

Here's another issue. How do you write a wildcard pattern that will match filenames containing at least one digit? At present, I don't think you can. But, assuming that a quoted empty string will separate adjacent wildcards, you could say *''*[digit][count:1]* or *''?[digit][nonempty]*. To make it look a little less hacky, we might allow empty modifiers instead: *[]*[digit][count:1]*.

In other shells, this is trivial, as in *[0-9]*. (I have to admit that the wildcards are my least favourite part of the elvish language. But the added power of modifiers is nice, so I am conflicted about them.)

hanche avatar May 23 '20 06:05 hanche

@hanche, Can't your "at least one" case be written as *?[set:ab]*? Same for filenames containing at least one digit: *?[digit]*. That works for me:

~/tmp> rm *
~/tmp> touch xay xby xcy a1 b2 c4 
~/tmp> put *?[set:ab]*
▶ a1
▶ b2
▶ xay
▶ xby
~/tmp> put *?[digit]*
▶ a1
▶ b2
▶ c4

More complicated scenarios are more easily, and probably safely, handled using the re: and str: modules in a callback.

I too frequently want to match filenames that contain specific characters but that already seems to be adequately supported. I don't even need all ten fingers to count the number of times in the past twenty years I've needed to do anything more complicated than your "exactly one" or "at least one" cases. When I needed that capability I simply ran the matching filenames through a filter like grep.

krader1961 avatar May 23 '20 22:05 krader1961

Oh sorry, it seems I am being stupid. Hopefully it is temporary. Perhaps due to the discussion of regexps, I conflated the use of ? in wildcards with that in a regexp, thinking of it as “maybe it's there, and maybe not”. My bad. Yes, you're right, that covers it.

hanche avatar May 24 '20 07:05 hanche