matam
matam copied to clipboard
Get single marker gene from read
I have a fasta file with a maker gene and I would like to extract it from raw illumina reads:
The marker is in a fasta file with only one sequence:
LLX10@00074518 MAIEDNPYVFRFEGRLWVSEEPRETAAAQLRAQREWDRQNARLQHWWVAISVSAVAGVAV TLYLGTSAGLAPAIYLVLLPIGFGAGAVLGALVNKRFFAPELQHGSLPPRPELAKLTRIP SRVARAAPDNASARDLIDWSTRGFVD
I try to construct a custom database with matam by I have this error:
$ singularity exec --bind /scratch/ulg/bioec/lcornet/matam:/mnt matam.sif matam_db_preprocessing.py -i /mnt/marker.fasta -d /mnt/marker/ --cpu 1 --max_memory 10000 -v
################################# MATAM db pre-processing #################################
CMD: /opt/miniconda/opt/matam-1.6.0/scripts/matam_db_preprocessing.py --verbose --cpu 1 --max_memory 10000 --min_length 10 --max_consecutive_n 5 --clustering_id_threshold 0.95 --db_dir /mnt/marker --input_ref /mnt/marker.fasta
INFO - Starting ref db pre-processing INFO - Extracting taxonomies from reference DB INFO - Cleaning reference db 1 sequences were rejected INFO - Starting ref db clustering INFO - Clustering sequences @ 95 pct id vsearch v2.15.2_linux_x86_64, 251.8GB RAM, 64 cores https://github.com/torognes/vsearch
Reading file /mnt/marker/marker.cleaned.fasta 100%
0 nt in 0 seqs
minseqlength 32: 1 sequence discarded.
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 0
Singletons: 0
Traceback (most recent call last):
File "/opt/miniconda/opt/matam-1.6.0/scripts/fasta_clean_name.py", line 62, in
WARNING: no write permissions in directory /tmpscratch: No such file or directory will try /tmp/.
Program: SortMeRNA version 2.1b, 03/03/2016 Copyright: 2012-16 Bonsai Bioinformatics Research Group: LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe 2014-16 Knight Lab: Department of Pediatrics, UCSD, La Jolla, Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. Contact: Evguenia Kopylova, [email protected] Laurent Noé, [email protected] Hélène Touzet, [email protected]
Parameters summary: K-mer size: 19 K-mer interval: 1 Maximum positions to store per unique K-mer: 10000
Total number of databases to index: 1
Begin indexing file /mnt/marker/marker_NR95.complete.fasta under index name /mnt/marker/marker_NR95.complete:
ERROR: at least one of your reads is shorter than the seed length 19, please filter out all reads shorter than 19 to continue index construction.
Collecting sequence distribution statistics .. INFO - Indexing clustered ref db The input file is empty, an index was not built.
Output MATAM db: /mnt/marker/marker_NR95
matam_db_preprocessing.py terminated with some errors. Check the log for additional infos