go
go copied to clipboard
path/filepath: Glob should support `**` for zero or more directories
Go version 1.4.2 Mac OS X 10.10
Example:
package main
import "fmt"
import "path/filepath"
import "os"
func main() {
files, err := filepath.Glob("/usr/local/go/src/**/*.go")
if err != nil {
fmt.Print(err)
os.Exit(1)
}
fmt.Printf("files: %d\n", len(files))
for _, f := range files {
fmt.Println(f)
}
}
Expected:
% ls /usr/local/go/src/**/*.go | wc -l
1633
Actual:
files: 732
It seems that ** is equivalent to *. The extended ** pattern is common in shells and is supported in Rust and Java for example.
For reference, here is Rust's implementation. I think these rules would be a good description of how Go's filepath.Glob should work:
https://github.com/rust-lang/glob/blob/master/src/lib.rs#L369
For reference, this is supported by bash when invoked as "bash -O globstar" or after running "shopt -s globstar".
OK to implement? @adg
@bradfitz for opinion
I think this is OK to implement.
Yep, I agree.
CL https://golang.org/cl/16280 mentions this issue.
all, I wanted to check in to see if this issue (#11862) was fixed. Please let me know status of this ticket. Is there a work around to this problem? Talk to you soon. Thanks in advance.
Status: currently I'm not working on this. Feel free to unassign.
I certainly see the utility here but adding ** has various knock-on effects that may not be obvious. For example, * matches symlinks to other directories. ** cannot, at least not without extra work to avoid file system cycles. Also it's not obvious this should be considered a backwards compatible change, both for Glob and for Match. And you'd probably want to do path.Match as well. There's a lot to consider here, even though it looks trivial.
@rsc closed the most recent CL for this with the comment:
I really don't think we should do any of this. This is much less understandable than the current code. If you need a very fast, very complex filepath.Glob, it is of course possible to write one outside the standard library.
Should this issue remain open?
This issue does remain open. What's confusing is that GitHub has linked just above your comment to issue Shopify/themekit#102, which is closed.
@rsc that is because someone linked to it in a Markdown link / comment: https://github.com/Shopify/themekit/issues/102#issuecomment-169756401
What can be done here? A better implementation? It's not clear to me.
I think it's useful, but I wouldn't consider it backwards compatible. ** already works and does something (which is the same something as a single star).
@extemporalgenome At best, I'd argue that ** is undefined and it just happens to do the same thing as *. I don't think this should represent an issue in regards to backwards compatibility.
@slimsag the shopify issue I think is different. They chose to walk the tree and that solved their case - I don't think that applies to the general problem. I think @rsc was saying that the solution here is non trivial (and it's not just filepath.Glob but also path.Match).
I'd like it to stay on the list. Coming from other languages, I think this is a pretty common feature for glob syntax.
Note that the downside of attempting to write your own implementation is that I think you would wind up re-implementing glob entirely. I'm trying to figure out how to solve this for http://github.com/jstemmer/gotags since it won't be solved in the standard library anytime soon.
@ascarter * is very well defined as matching any sequence (including an empty sequence) of non-separator characters. Thus ** is equally well defined as matching zero or more non-separator characters, followed by a match of zero characters on the tail of the same filename.
The proposed special-casing of ** will certainly not match less than it does now, but may match more, and the question is whether existing programs could become broken through what constitutes a change in behavior.
I'm thinking this feature should be provided from third-party products not core. for example, you can use https://github.com/mattn/go-zglob
I would be nice if the documentation about Glob mentioned that ** isn't implemented.
@r03 The current docs are precise about what is implemented. I'm not sure it's necessary to say what is not implemented.
I agree with @r03 that it would be nice to see the lack of support mentioned; it is the way glob functions in other languages, like Ruby, work, so it might not even occur to users that the feature might not be implemented.
Like Ian said, I'm not convinced we should enumerate the many possible glob extension syntaxes that exist but are not implemented by this function.
Any update ?
I would have to disagree with anyone claiming that the docs shouldn't mention this.
If you called it something other than "glob", then I would agree. But "glob" has a set of features that exist across languages/systems. Calling it the same as everything else, but not being like the other versions is worth calling out. Otherwise, don't call it glob.
@FabledWeb filepath.Glob correponds to the glob function in the C standard library.
@ianlancetaylor I'm sure there is a good reason for it, but the common use of the word has a different meaning. Documentation shouldn't rely on someone knowing that the name refers to the C standard library version and that the C standard library acts differently than most things they'd come in contact with.
@FabledWeb I suppose that what I'm doing is disagreeing with you about the common use of the word. To mean "glob" means the operation in the C library and the operation performed by the shell, and for neither of those does ** have a special meaning (in bash you can set an option to give ** a special meaning, but it's not the default). I understand that for you glob implies **, but for me it doesn't.
I continue to believe that the documentation should explain what the function does, not what it doesn't do.
Man, I wish there was threading here so I don't take all this screen real-estate.
The fact that we disagree on what the word means, and the fact that I and others didn't immediately realize that it didn't support the glob syntax we assumed was standard, suggests that there's something missing from the documentation that would make it clear what's not supported (by way of stating what is supported if you prefer). My real point about all this is that people are getting confused and the way to fix that is not to say "the documentation is accurate". The documentation should serve to help users understand, not just be a pure technical spec. If you don't want to say "the multi-directory wildcard ** isn't supported", then I can't say precisely what to change to make it more consumable, but apparently there is something lacking right now.
By the way @ianlancetaylor, I love that you're so active on these issues. It's refreshing to actually get responses when commenting or opening PRs.
I guess it seems to me that the documentation of Glob does say what is supported, albeit by reference to Match which lists the precise syntax. I'd be happy to see specific suggestions for how to improve the docs, I just want to avoid saying "By the way, you may have heard about ** in glob, but that doesn't work." That just seems odd.
By the way @ianlancetaylor, I love that you're so active on these issues. It's refreshing to actually get responses when commenting or opening PRs.
Thanks, and, you're welcome.
Python solved this with an optional recursive=True parameter that could be passed to the function for globbing.
One potential solution could be an addition to the API that would provide recursive functionality so backward compatibility would be less of an issue, something like:
filepath.GlobRecursive("**/*")
However that would no longer use exact "Match" functionality that is described in the docs. If that was an issue you could always add an aditonal method MatchRecursive, however that does not seem very elegant of a solution.
Sidenote: I too made the assumption on how ** worked because of how it's been used in other languages and I am unfamilar with C.
Another 3rd party lib: I've just come across https://github.com/bmatcuk/doublestar, seems to be working okay so far...
Damn, just switched to this library (from gopkg.in/godo.v2/glob since it produces a nil pointer reference if you don't have permission) only to find it doesn't support the ** syntax.
I'll checkout doublestar, but I do hope this library adds the modern Glob features everyone expects these days.
So dumb. We really need this supported.
If this is going to be implemented, please, consider also support for braces ({, }), so that the implementation would be fully compatible with other languages.
@grv87 Thanks, that's a great example of why we shouldn't change anything here. As noted above, Go's function acts like the C library's glob function. So there is no implementation that is "fully compatible with other languages."
What about a separate function for the modern usage of globs? I'm not sure why it needs to be strictly C compatible, given golang is made for now, and has the benefit of losing the decades old baggage.
A separate function can live anywhere. https://golang.org/doc/faq#x_in_std . I see that https://godoc.org/github.com/gobwas/glob exists.
Stopgap Workaround
This may not be a perfect solution, but weighing in at only 60 LoC it's a good start for those willing to use something outside of the standard library:
https://stackoverflow.com/a/40181899/151312
Given this issues age, it seems clear its never making it into std. So here is something similar you can use:
package main
import (
"io/fs"
"path/filepath"
)
func glob(root string, fn func(string)bool) []string {
var matches []string
filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
if fn(path) {
matches = append(matches, path)
}
return nil
})
return matches
}
func main() {
matches := glob(".", func(path string) bool {
return filepath.Ext(path) == ".go"
})
for _, match := range matches {
println(match)
}
}
In general I agree that a function's documentation should say what it does and not what it doesn't do. But using ** is a pitfall, and we do document pitfalls, as in this lengthly caveat from fs.Sub:
Note that Sub(os.DirFS("/"), "prefix") is equivalent to os.DirFS("/prefix") and that neither of them guarantees to avoid operating system accesses outside "/prefix", because the implementation of os.DirFS does not check for symbolic links inside "/prefix" that point to other directories. That is, os.DirFS is not a general substitute for a chroot-style security mechanism, and Sub does not change that fact.
We could word the doc so that it describes a consequence of the function's behavior, instead of describing non-behavior. For example, "Note that multiple consecutive *s behave like a single *".
Could this be needed to match only main.go, hey.go files in the following case?
- unwanted.go
- dir1
| - main.go
| - dir2
| - hey.go
Glob("dir1/*.go")only matchesmain.goGlob("dir1/*/*.go")only matcheshey.goGlob("*.go")matches them all, including the unwantedunwanted.go
It's a shame such an famous Glob feature not implemented in Go for 7 years.
@adg @rsc is there a chance to prioritize it please?
@mih-kopylov There are some concerns that Russ outlined in an earlier comment and also there are compatibility promises that we must heed. So if someone wants to address this issue they need should probably write it up as a proposal that addresses all these potential issues.
In the meantime, if there is a (seemingly, from my brief research) mature doublestar package that provides a glob that handles **. It doesn't have any dependencies and has extensive unit tests. Give it a shot.