go icon indicating copy to clipboard operation
go copied to clipboard

path/filepath: Glob should support `**` for zero or more directories

Open ascarter opened this issue 10 years ago • 43 comments

Go version 1.4.2 Mac OS X 10.10

Example:

package main

import "fmt"
import "path/filepath"
import "os"

func main() {
    files, err := filepath.Glob("/usr/local/go/src/**/*.go")
    if err != nil {
            fmt.Print(err)
            os.Exit(1)
    }
    fmt.Printf("files: %d\n", len(files))
    for _, f := range files {
        fmt.Println(f)
    }
}

Expected:

% ls /usr/local/go/src/**/*.go | wc -l
    1633

Actual:

files: 732

It seems that ** is equivalent to *. The extended ** pattern is common in shells and is supported in Rust and Java for example.

ascarter avatar Jul 24 '15 22:07 ascarter

For reference, here is Rust's implementation. I think these rules would be a good description of how Go's filepath.Glob should work:

https://github.com/rust-lang/glob/blob/master/src/lib.rs#L369

ascarter avatar Jul 24 '15 22:07 ascarter

For reference, this is supported by bash when invoked as "bash -O globstar" or after running "shopt -s globstar".

ianlancetaylor avatar Jul 25 '15 00:07 ianlancetaylor

OK to implement? @adg

nodirt avatar Oct 22 '15 06:10 nodirt

@bradfitz for opinion

nodirt avatar Oct 22 '15 17:10 nodirt

I think this is OK to implement.

ianlancetaylor avatar Oct 22 '15 22:10 ianlancetaylor

Yep, I agree.

adg avatar Oct 23 '15 00:10 adg

CL https://golang.org/cl/16280 mentions this issue.

gopherbot avatar Oct 23 '15 19:10 gopherbot

all, I wanted to check in to see if this issue (#11862) was fixed. Please let me know status of this ticket. Is there a work around to this problem? Talk to you soon. Thanks in advance.

dotcomputercraft avatar Dec 13 '15 16:12 dotcomputercraft

Status: currently I'm not working on this. Feel free to unassign.

nodirt avatar Dec 13 '15 17:12 nodirt

I certainly see the utility here but adding ** has various knock-on effects that may not be obvious. For example, * matches symlinks to other directories. ** cannot, at least not without extra work to avoid file system cycles. Also it's not obvious this should be considered a backwards compatible change, both for Glob and for Match. And you'd probably want to do path.Match as well. There's a lot to consider here, even though it looks trivial.

rsc avatar Jan 04 '16 16:01 rsc

@rsc closed the most recent CL for this with the comment:

I really don't think we should do any of this. This is much less understandable than the current code. If you need a very fast, very complex filepath.Glob, it is of course possible to write one outside the standard library.

Should this issue remain open?

emidoots avatar Feb 08 '16 23:02 emidoots

This issue does remain open. What's confusing is that GitHub has linked just above your comment to issue Shopify/themekit#102, which is closed.

rsc avatar Feb 09 '16 00:02 rsc

@rsc that is because someone linked to it in a Markdown link / comment: https://github.com/Shopify/themekit/issues/102#issuecomment-169756401

What can be done here? A better implementation? It's not clear to me.

emidoots avatar Feb 09 '16 00:02 emidoots

I think it's useful, but I wouldn't consider it backwards compatible. ** already works and does something (which is the same something as a single star).

extemporalgenome avatar Feb 09 '16 00:02 extemporalgenome

@extemporalgenome At best, I'd argue that ** is undefined and it just happens to do the same thing as *. I don't think this should represent an issue in regards to backwards compatibility.

ascarter avatar Feb 09 '16 01:02 ascarter

@slimsag the shopify issue I think is different. They chose to walk the tree and that solved their case - I don't think that applies to the general problem. I think @rsc was saying that the solution here is non trivial (and it's not just filepath.Glob but also path.Match).

I'd like it to stay on the list. Coming from other languages, I think this is a pretty common feature for glob syntax.

Note that the downside of attempting to write your own implementation is that I think you would wind up re-implementing glob entirely. I'm trying to figure out how to solve this for http://github.com/jstemmer/gotags since it won't be solved in the standard library anytime soon.

ascarter avatar Feb 09 '16 01:02 ascarter

@ascarter * is very well defined as matching any sequence (including an empty sequence) of non-separator characters. Thus ** is equally well defined as matching zero or more non-separator characters, followed by a match of zero characters on the tail of the same filename.

The proposed special-casing of ** will certainly not match less than it does now, but may match more, and the question is whether existing programs could become broken through what constitutes a change in behavior.

extemporalgenome avatar Feb 09 '16 14:02 extemporalgenome

I'm thinking this feature should be provided from third-party products not core. for example, you can use https://github.com/mattn/go-zglob

mattn avatar Feb 09 '16 15:02 mattn

I would be nice if the documentation about Glob mentioned that ** isn't implemented.

roel4d avatar Aug 15 '17 21:08 roel4d

@r03 The current docs are precise about what is implemented. I'm not sure it's necessary to say what is not implemented.

ianlancetaylor avatar Aug 15 '17 21:08 ianlancetaylor

I agree with @r03 that it would be nice to see the lack of support mentioned; it is the way glob functions in other languages, like Ruby, work, so it might not even occur to users that the feature might not be implemented.

trytriangles avatar Oct 20 '17 19:10 trytriangles

Like Ian said, I'm not convinced we should enumerate the many possible glob extension syntaxes that exist but are not implemented by this function.

rsc avatar Oct 20 '17 19:10 rsc

Any update ?

mfilotto avatar Mar 28 '18 04:03 mfilotto

I would have to disagree with anyone claiming that the docs shouldn't mention this.

If you called it something other than "glob", then I would agree. But "glob" has a set of features that exist across languages/systems. Calling it the same as everything else, but not being like the other versions is worth calling out. Otherwise, don't call it glob.

FabledWeb avatar Jun 26 '18 19:06 FabledWeb

@FabledWeb filepath.Glob correponds to the glob function in the C standard library.

ianlancetaylor avatar Jun 26 '18 19:06 ianlancetaylor

@ianlancetaylor I'm sure there is a good reason for it, but the common use of the word has a different meaning. Documentation shouldn't rely on someone knowing that the name refers to the C standard library version and that the C standard library acts differently than most things they'd come in contact with.

FabledWeb avatar Jun 26 '18 19:06 FabledWeb

@FabledWeb I suppose that what I'm doing is disagreeing with you about the common use of the word. To mean "glob" means the operation in the C library and the operation performed by the shell, and for neither of those does ** have a special meaning (in bash you can set an option to give ** a special meaning, but it's not the default). I understand that for you glob implies **, but for me it doesn't.

I continue to believe that the documentation should explain what the function does, not what it doesn't do.

ianlancetaylor avatar Jun 26 '18 22:06 ianlancetaylor

Man, I wish there was threading here so I don't take all this screen real-estate.

The fact that we disagree on what the word means, and the fact that I and others didn't immediately realize that it didn't support the glob syntax we assumed was standard, suggests that there's something missing from the documentation that would make it clear what's not supported (by way of stating what is supported if you prefer). My real point about all this is that people are getting confused and the way to fix that is not to say "the documentation is accurate". The documentation should serve to help users understand, not just be a pure technical spec. If you don't want to say "the multi-directory wildcard ** isn't supported", then I can't say precisely what to change to make it more consumable, but apparently there is something lacking right now.

By the way @ianlancetaylor, I love that you're so active on these issues. It's refreshing to actually get responses when commenting or opening PRs.

FabledWeb avatar Jun 27 '18 13:06 FabledWeb

I guess it seems to me that the documentation of Glob does say what is supported, albeit by reference to Match which lists the precise syntax. I'd be happy to see specific suggestions for how to improve the docs, I just want to avoid saying "By the way, you may have heard about ** in glob, but that doesn't work." That just seems odd.

By the way @ianlancetaylor, I love that you're so active on these issues. It's refreshing to actually get responses when commenting or opening PRs.

Thanks, and, you're welcome.

ianlancetaylor avatar Jun 27 '18 13:06 ianlancetaylor

Python solved this with an optional recursive=True parameter that could be passed to the function for globbing.

One potential solution could be an addition to the API that would provide recursive functionality so backward compatibility would be less of an issue, something like:

filepath.GlobRecursive("**/*")

However that would no longer use exact "Match" functionality that is described in the docs. If that was an issue you could always add an aditonal method MatchRecursive, however that does not seem very elegant of a solution.

Sidenote: I too made the assumption on how ** worked because of how it's been used in other languages and I am unfamilar with C.

EliCDavis avatar Aug 01 '18 14:08 EliCDavis

Another 3rd party lib: I've just come across https://github.com/bmatcuk/doublestar, seems to be working okay so far...

stu0292 avatar Aug 02 '18 16:08 stu0292

Damn, just switched to this library (from gopkg.in/godo.v2/glob since it produces a nil pointer reference if you don't have permission) only to find it doesn't support the ** syntax.

I'll checkout doublestar, but I do hope this library adds the modern Glob features everyone expects these days.

NickLarsenNZ avatar Nov 15 '18 00:11 NickLarsenNZ

So dumb. We really need this supported.

vsg24 avatar Dec 06 '18 19:12 vsg24

If this is going to be implemented, please, consider also support for braces ({, }), so that the implementation would be fully compatible with other languages.

grv87 avatar Jan 03 '19 09:01 grv87

@grv87 Thanks, that's a great example of why we shouldn't change anything here. As noted above, Go's function acts like the C library's glob function. So there is no implementation that is "fully compatible with other languages."

ianlancetaylor avatar Jan 04 '19 21:01 ianlancetaylor

What about a separate function for the modern usage of globs? I'm not sure why it needs to be strictly C compatible, given golang is made for now, and has the benefit of losing the decades old baggage.

NickLarsenNZ avatar Jan 04 '19 21:01 NickLarsenNZ

A separate function can live anywhere. https://golang.org/doc/faq#x_in_std . I see that https://godoc.org/github.com/gobwas/glob exists.

ianlancetaylor avatar Jan 04 '19 21:01 ianlancetaylor

Stopgap Workaround

This may not be a perfect solution, but weighing in at only 60 LoC it's a good start for those willing to use something outside of the standard library:

https://stackoverflow.com/a/40181899/151312

coolaj86 avatar Sep 18 '20 08:09 coolaj86

Given this issues age, it seems clear its never making it into std. So here is something similar you can use:

package main

import (
   "io/fs"
   "path/filepath"
)

func glob(root string, fn func(string)bool) []string {
   var matches []string
   filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
      if fn(path) {
         matches = append(matches, path)
      }
      return nil
   })
   return matches
}

func main() {
   matches := glob(".", func(path string) bool {
      return filepath.Ext(path) == ".go"
   })
   for _, match := range matches {
      println(match)
   }
}

89z avatar Jul 27 '21 14:07 89z

In general I agree that a function's documentation should say what it does and not what it doesn't do. But using ** is a pitfall, and we do document pitfalls, as in this lengthly caveat from fs.Sub:

Note that Sub(os.DirFS("/"), "prefix") is equivalent to os.DirFS("/prefix") and that neither of them guarantees to avoid operating system accesses outside "/prefix", because the implementation of os.DirFS does not check for symbolic links inside "/prefix" that point to other directories. That is, os.DirFS is not a general substitute for a chroot-style security mechanism, and Sub does not change that fact.

We could word the doc so that it describes a consequence of the function's behavior, instead of describing non-behavior. For example, "Note that multiple consecutive *s behave like a single *".

jba avatar Oct 28 '21 13:10 jba

Could this be needed to match only main.go, hey.go files in the following case?

- unwanted.go
- dir1
| - main.go
| - dir2
   | - hey.go
  • Glob("dir1/*.go") only matches main.go
  • Glob("dir1/*/*.go") only matches hey.go
  • Glob("*.go") matches them all, including the unwanted unwanted.go

soypat avatar Jul 17 '22 16:07 soypat

It's a shame such an famous Glob feature not implemented in Go for 7 years.

@adg @rsc is there a chance to prioritize it please?

mih-kopylov avatar Aug 07 '22 20:08 mih-kopylov

@mih-kopylov There are some concerns that Russ outlined in an earlier comment and also there are compatibility promises that we must heed. So if someone wants to address this issue they need should probably write it up as a proposal that addresses all these potential issues.

In the meantime, if there is a (seemingly, from my brief research) mature doublestar package that provides a glob that handles **. It doesn't have any dependencies and has extensive unit tests. Give it a shot.

adg avatar Aug 07 '22 23:08 adg