fcs
fcs copied to clipboard
[FEATURE REQUEST]: remove prefix to sequence names
Is this a feature request for FCS-adaptor or FCS-GX? Yes
Describe the problem you'd like to be solved
At the moment, the output of FCS add a prefix to the sequences' names of lcl|
.
Describe the solution you'd like Remove this prefix.
Describe alternatives you've considered Make it optional with a command line argument.
Hello,
The addition of lcl| is specific to the FCS-adaptor behavior when accessing the cleaned_sequences directory.
We can consider implementing this in an upcoming release. In the meantime, use sed
. Another option is to download the fcs.py runner script and clean the original, uncleaned FASTA with the adaptor report like so:
curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py
zcat ./inputdir/uncleaned.fa.gz | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta
Is the fcs.py
script in the singularity image?
No, fcs.py
is a wrapper to run the executables inside Docker/Singularity containers. If you need to set up using Singularity, follow the instructions here to get the runner, singularity image, and set the image env var:
https://github.com/ncbi/fcs/wiki/FCS-GX#quickstart
We have an updated wiki that demonstrates how to clean a genome with FCS-adaptor style output:
https://github.com/ncbi/fcs/wiki/FCS-adaptor-quickstart#clean-the-genome