fcs icon indicating copy to clipboard operation
fcs copied to clipboard

[FEATURE REQUEST]: remove prefix to sequence names

Open fgvieira opened this issue 11 months ago • 3 comments

Is this a feature request for FCS-adaptor or FCS-GX? Yes

Describe the problem you'd like to be solved At the moment, the output of FCS add a prefix to the sequences' names of lcl|.

Describe the solution you'd like Remove this prefix.

Describe alternatives you've considered Make it optional with a command line argument.

fgvieira avatar Feb 28 '24 09:02 fgvieira

Hello,

The addition of lcl| is specific to the FCS-adaptor behavior when accessing the cleaned_sequences directory.

We can consider implementing this in an upcoming release. In the meantime, use sed. Another option is to download the fcs.py runner script and clean the original, uncleaned FASTA with the adaptor report like so:

curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py

zcat ./inputdir/uncleaned.fa.gz | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta

etvedte avatar Feb 28 '24 17:02 etvedte

Is the fcs.py script in the singularity image?

fgvieira avatar Mar 03 '24 13:03 fgvieira

No, fcs.py is a wrapper to run the executables inside Docker/Singularity containers. If you need to set up using Singularity, follow the instructions here to get the runner, singularity image, and set the image env var: https://github.com/ncbi/fcs/wiki/FCS-GX#quickstart

etvedte avatar Mar 04 '24 13:03 etvedte

We have an updated wiki that demonstrates how to clean a genome with FCS-adaptor style output:

https://github.com/ncbi/fcs/wiki/FCS-adaptor-quickstart#clean-the-genome

etvedte avatar Jun 26 '24 14:06 etvedte