[FEATURE] Not use `export GTDBTK_DATA_PATH="\$(find -L ${db} -name 'metadata' -type d -exec dirname {} \\;)"`
Is your feature request related to a problem? Please describe
I'm not entirely sure why this line is used on gtdbtk/classifywf module:
https://github.com/nf-core/modules/blob/73da03cd2508eda722a32111113e0d7da353abcd/modules/nf-core/gtdbtk/classifywf/main.nf#L36
export GTDBTK_DATA_PATH="\$(find -L ${db} -name 'metadata' -type d -exec dirname {} \\;)"
It recursively searches the directory and takes way longer than it should.
Describe the solution you'd like
Why not just use?
export GTDBTK_DATA_PATH="${db}"
Describe alternatives you've considered
I can have my own local copy but I guess I'm trying to understand why dev added this in the first place. What edge case was this solving?
Additional context
No response
@jfy133 can explain further, but basically in nf-core/mag we have had problems with users not specifying the correct directory when running the pipeline - for example, issues with an extra level of structure appearing when untarring.
So this allows us to be a bit more lax on the required inputs. We also have some documentation on using a SquashFS to mount the database, and I don't know how well it would operate with a change: https://github.com/nf-core/mag/pull/793
Though personally I think the input to the process should be staged as an env 😅
Basically what Jim said, but we can probably improve that to add -depth 3 or something to restrict how deep it goes into...
Ah that makes a lot of sense. Your idea for -depth 3 sounds like a great way to have both flexibility and efficiency.