modules icon indicating copy to clipboard operation
modules copied to clipboard

[FEATURE] Not use `export GTDBTK_DATA_PATH="\$(find -L ${db} -name 'metadata' -type d -exec dirname {} \\;)"`

Open jolespin opened this issue 2 months ago • 3 comments

Is your feature request related to a problem? Please describe

I'm not entirely sure why this line is used on gtdbtk/classifywf module:

https://github.com/nf-core/modules/blob/73da03cd2508eda722a32111113e0d7da353abcd/modules/nf-core/gtdbtk/classifywf/main.nf#L36

    export GTDBTK_DATA_PATH="\$(find -L ${db} -name 'metadata' -type d -exec dirname {} \\;)"

It recursively searches the directory and takes way longer than it should.

Describe the solution you'd like

Why not just use?

 export GTDBTK_DATA_PATH="${db}" 

Describe alternatives you've considered

I can have my own local copy but I guess I'm trying to understand why dev added this in the first place. What edge case was this solving?

Additional context

No response

jolespin avatar Oct 30 '25 23:10 jolespin

@jfy133 can explain further, but basically in nf-core/mag we have had problems with users not specifying the correct directory when running the pipeline - for example, issues with an extra level of structure appearing when untarring.

So this allows us to be a bit more lax on the required inputs. We also have some documentation on using a SquashFS to mount the database, and I don't know how well it would operate with a change: https://github.com/nf-core/mag/pull/793

Though personally I think the input to the process should be staged as an env 😅

prototaxites avatar Nov 05 '25 12:11 prototaxites

Basically what Jim said, but we can probably improve that to add -depth 3 or something to restrict how deep it goes into...

jfy133 avatar Nov 05 '25 12:11 jfy133

Ah that makes a lot of sense. Your idea for -depth 3 sounds like a great way to have both flexibility and efficiency.

jolespin avatar Nov 05 '25 16:11 jolespin