buf icon indicating copy to clipboard operation
buf copied to clipboard

Specifying Folders In `buf generate --path` Arguments Causes Massive Performance Hit

Open JesseObrien opened this issue 1 year ago • 3 comments

Hi, I've been chatting through this problem on the buf slack in a thread here. I've been discussing it with @jhump for the most part.

The problem is arising when we're calling buf generate with a folder in the --path arguments versus calling it with files in the --path arguments.

In simple terms: A) buf generate --config=buf.yaml --path=/foo/bar/directory takes ~1 minute to generate .ts files for 11 .proto files nested in that directory. B) buf generate --config=buf.yaml --path=/foo/bar/directory/file1.proto,/foo/bar/directory/file2.proto,... takes <1 second to generated .ts files for 11 .proto files nested in the same directory.

The root folder we're calling buf generate from is a very large monorepo with hundreds of thousands of files. If we do not recursively expand all .proto files and inject them into that one --path argument (or specify them as 11 separate --path arugments), buf generate becomes 60+x slower.

If I can provide any more context let me know. I verified this by running buf generate a bunch of different times without expanding the files and specifying the folder to make sure it's the folder that's causing it.

JesseObrien avatar Jan 18 '24 21:01 JesseObrien

That doesn't seem that unexpected - if buf has to search /foo/bar/directory for all relevant .proto files, that's going to take some time (and I'm certain that however buf searches for it is not as optimized as some typical bash tools are) - we can look into optimizing that path a bit, but searching a directory with 100,000+ files for 11 specific .proto files is going to take some time.

bufdev avatar Jan 18 '24 21:01 bufdev

@bufdev, IIRC, the foo/bar/directory folder does not have that many files. The issue is that the "input" to buf was unspecified, and thus default to the current working directory. The current working directory is the root of the repo and huge. When --path indicates a file, it is fast. But it seems like --path with a directory name isn't actually looking only at that one directory but instead collecting everything in the "input" module (so scanning the huge repo root directory) and then filtering the result based on prefix match. (The above is my suspicion based on the observed behavior; I haven't gone through the implementation code yet to confirm what it's doing.)

jhump avatar Jan 18 '24 21:01 jhump

That shouldn't be the case - we have optimized for that scenario, so it should only do the search on the directory specified in --path. There may be a regression - we have to play with this locally.

bufdev avatar Jan 18 '24 23:01 bufdev

Cannot reproduce this issue with buf generate on v1.36.0,--path commands correctly restrict the tree of files included. Issues only occur for breaking and lint commands.

emcfarlane avatar Aug 13 '24 17:08 emcfarlane