GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

Remove requirement that all fasta input files have the same extension

Open jdwinkler-lanzatech opened this issue 4 years ago • 2 comments

Hi,

Thanks for your work on GTDBTk. I'm currently trying to process a bunch of assemblies from different sources (binning tools, public repositories) that use different file extensions, so the requirement that all the files have the same extension is proving troublesome. I can circumvent this by symlinking the files to some assumed name, but this solution seems pretty hacky.

Would it be possible to relax the restriction on a single file extension for genome assemblies?

jdwinkler-lanzatech avatar May 13 '20 14:05 jdwinkler-lanzatech

You can specify the input using the --batchfile flag which gives you a lot of flexibility and might be your best option for now. We'll look into allowing multiple file extensions for the next release.

donovan-h-parks avatar May 13 '20 14:05 donovan-h-parks

Ah, I'll take a look. I couldn't find any options besides genome_dir in the online documentation, but I see it in the CLI. Thanks for the tip.

jdwinkler-lanzatech avatar May 13 '20 14:05 jdwinkler-lanzatech