matam icon indicating copy to clipboard operation
matam copied to clipboard

Get single marker gene from read

Open Lcornet opened this issue 2 years ago • 0 comments

I have a fasta file with a maker gene and I would like to extract it from raw illumina reads:

The marker is in a fasta file with only one sequence:

LLX10@00074518 MAIEDNPYVFRFEGRLWVSEEPRETAAAQLRAQREWDRQNARLQHWWVAISVSAVAGVAV TLYLGTSAGLAPAIYLVLLPIGFGAGAVLGALVNKRFFAPELQHGSLPPRPELAKLTRIP SRVARAAPDNASARDLIDWSTRGFVD

I try to construct a custom database with matam by I have this error:

$ singularity exec --bind /scratch/ulg/bioec/lcornet/matam:/mnt matam.sif matam_db_preprocessing.py -i /mnt/marker.fasta -d /mnt/marker/ --cpu 1 --max_memory 10000 -v

################################# MATAM db pre-processing #################################

CMD: /opt/miniconda/opt/matam-1.6.0/scripts/matam_db_preprocessing.py --verbose --cpu 1 --max_memory 10000 --min_length 10 --max_consecutive_n 5 --clustering_id_threshold 0.95 --db_dir /mnt/marker --input_ref /mnt/marker.fasta

INFO - Starting ref db pre-processing INFO - Extracting taxonomies from reference DB INFO - Cleaning reference db 1 sequences were rejected INFO - Starting ref db clustering INFO - Clustering sequences @ 95 pct id vsearch v2.15.2_linux_x86_64, 251.8GB RAM, 64 cores https://github.com/torognes/vsearch

Reading file /mnt/marker/marker.cleaned.fasta 100% 0 nt in 0 seqs minseqlength 32: 1 sequence discarded. Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 100% Sorting clusters 100% Writing clusters 100% Clusters: 0 Singletons: 0 Traceback (most recent call last): File "/opt/miniconda/opt/matam-1.6.0/scripts/fasta_clean_name.py", line 62, in sequence_id = header.split()[0] IndexError: list index out of range INFO - Renaming output files as MATAM db files INFO - Indexing complete ref db

WARNING: no write permissions in directory /tmpscratch: No such file or directory will try /tmp/.

Program: SortMeRNA version 2.1b, 03/03/2016 Copyright: 2012-16 Bonsai Bioinformatics Research Group: LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe 2014-16 Knight Lab: Department of Pediatrics, UCSD, La Jolla, Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. Contact: Evguenia Kopylova, [email protected] Laurent Noé, [email protected] Hélène Touzet, [email protected]

Parameters summary: K-mer size: 19 K-mer interval: 1 Maximum positions to store per unique K-mer: 10000

Total number of databases to index: 1

Begin indexing file /mnt/marker/marker_NR95.complete.fasta under index name /mnt/marker/marker_NR95.complete:

ERROR: at least one of your reads is shorter than the seed length 19, please filter out all reads shorter than 19 to continue index construction.

Collecting sequence distribution statistics .. INFO - Indexing clustered ref db The input file is empty, an index was not built.

Output MATAM db: /mnt/marker/marker_NR95

matam_db_preprocessing.py terminated with some errors. Check the log for additional infos

Lcornet avatar May 19 '22 18:05 Lcornet