glob icon indicating copy to clipboard operation
glob copied to clipboard

Support escaping special characters

Open yufeih opened this issue 5 years ago • 9 comments

Characters like {, }, ,, !,[,] are valid file path names. To match path containing these special symbols, glob pattern need to support escape using \. In this case, \[ should match [ instead of throwing an error.

yufeih avatar Oct 31 '18 07:10 yufeih

I will see if I can get to this before the weekend.

kthompson avatar Oct 31 '18 18:10 kthompson

Thank you Kevin, this is not an urgent feature request, take your time.

yufeih avatar Oct 31 '18 23:10 yufeih

Hi @kthompson , any update on this? Some of our component right now stucked on this, there are many patterns that contain ( ) as file names but glob don't accept them.

yufeih avatar Mar 10 '20 10:03 yufeih

I just released a build that may help. It adds support for all characters that are not special to globbing syntax as part of issue #57

My plan is to add support for escaping characters by using \ which is currently used for a secondary path separator. Since this may break many existing glob expressions this will be part of a 2.0 release.

kthompson avatar Mar 11 '20 01:03 kthompson

Hello @yufeih, it has been a while since I have updated this ticket. If you would like to take a look at beta version with this feature you can find it here: https://www.nuget.org/packages/Glob/1.2.0-alpha0029

kthompson avatar Sep 06 '20 21:09 kthompson

Hello @kthompson, your library has been very helpful for my projects. Thank you very much!

How would escaping { and } currently be used for delimited literals still work? For example:

` Company\Documents{Ives,Hayes}**.xlsx

jberd126 avatar Sep 28 '20 23:09 jberd126

In the current(unreleased) implementation, if an escape sequence is detected ie \ then we require one of the following characters: *?{}[]() or space or , when inside of a literal set to proceed it. Not doing so will result in an exception currently.

As an aside the reason I chose this was because I figured I had the following options:

  • Option 1: Stop treating \ as a path delimiter and use it as a strict escape sequence character. If there is not a valid escape sequence throw as an invalid pattern. This breaks existing patterns that use \ as a path delimiter but will fail on expression creation.
  • Option 2: Stop treating \ as a path delimiter and use it as an escape sequence character BUT if there is not a valid escape sequence treat the character as part of a filename(only valid for Linux/OSX?). This breaks existing patterns that use \ as a path delimiter but may fail silently
  • Option 3: Keep treating \ as a path separator but also allow it as an escape sequence, in the event of an invalid escape sequence the \ is treated as a path separator. This would be less likely to break existing patterns, but it will still break some(most?) patterns like for example ***.txt previously it would work for any file with .txt but now it would mean anyfile with the literal character * and extension txt

Options 2 and 3 would have the result of having really weird and potentially difficult to spot errors, while Option 1 you would get errors but they would at least be easy to solve.

To get back to your question, if you want to match on a literal set where there is a comma in one of the literals you could do something like a{a\,a,bb} which should match aa,a and abb

kthompson avatar Sep 28 '20 23:09 kthompson

Thanks @kthompson. I can understand how this.

I reviewed my previous comment and noticed that I inadvertently dropped the backslash before the curly brace when I was cleaning it up. In this corrected example, I am trying to match a directory that is either "Ives" or "Hayes".

Company\Documents\{Ives,Hayes}**.xlsx

In this scenario, it does not appear that I could have a wildcard immediately after a directory separating backslash.

I was looking around for other ways that this could be done and found in DotNet.Glob - Escaping Special Characters Darrell uses [ and ] instead of backslashes.

jberd126 avatar Sep 29 '20 13:09 jberd126

@jberd126 I am not completely clear on what you are trying to do but there are a couple issues in your pattern.

** Will only match full directory names if you want a partial match on a directory you need to use *

Second you must use / for path separators in your pattern

Given those two issues you probably want something like: Company/Documents/{Ives,Hayes}/*.xlsx or Company/Documents/**/{Ives,Hayes}/*.xlsx or Company/Documents/**/{Ives,Hayes}/**/*.xlsx

kthompson avatar Sep 29 '20 14:09 kthompson

Completed in c256cfcda0691b845b5673aec9750ef5114a059f

kthompson avatar Feb 15 '24 02:02 kthompson