srst2
srst2 copied to clipboard
custom databases - manual intervention required to complete code for sequence reads
Hi there,
I am using SRST2 for a custom database to search for a small variable gene region (~320bp with flanking) within a set of Campylobacter sp genomes. i have made a small database of unique sequences from a much larger sequence dataset using the provided instructions. This small dataset has 100 sequences, and clusters to 5 sequences at c = 0.9
within cdhit-est
. i have made the sequence names as simple as possible in case that was the issue. My problem is that the code cannot run without manual intervention (having to push Ctrl-C
) after the line <mpileup> Set max per-file depth to 8000
to complete the run, as shown below (I have changed the input file names, but all other code is correct):
testOfFlankingBla$ time python2 ~/software/srst2/scripts/srst2.py --input_pe ../flaA_singleTest/SRRxxxxx_1.fastq.gz
../flaA_singleTest/SRRxxxxx_2.fastq.gz --output SRRxxxxx --gene_db ../flankingBlaBit_cdhit.fasta --log
1968887 reads; of these:
1968887 (100.00%) were paired; of these:
1968800 (100.00%) aligned concordantly 0 times
9 (0.00%) aligned concordantly exactly 1 time
78 (0.00%) aligned concordantly >1 times
----
1968800 pairs aligned concordantly 0 times; of these:
0 (0.00%) aligned discordantly 1 time
----
1968800 pairs aligned 0 times concordantly or discordantly; of these:
3937600 mates make up the pairs; of these:
3937577 (100.00%) aligned 0 times
4 (0.00%) aligned exactly 1 time
19 (0.00%) aligned >1 times
0.01% overall alignment rate
[samopen] SAM header is present: 100 sequences.
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
sh: 1: OXC8243__27943: not found
sh: 1: OXC8243__00001: not found
^Csh: 1: NCTC11168__48: not found
sh: 1: NCTC11168__00008: not found
^Csh: 1: ARI2590__39380: not found
sh: 1: ARI2590__00095: not found
^Csh: 1: 8096__00098: not found
sh: 1: 8096__24271: not found
^C
real 14m28.381s
user 1m5.051s
sys 0m3.154s
i let this run go on (~14 minutes) to see if it was a timing issue (it wasn't). However, i get to the <mpileup> Set max per-file depth to 8000
line after about 90 seconds. Automating this on a folder of Illumina PE sequences is therefore currently not possible. i do get output, including a table of hits. Do you have any idea about why this is happening, and how to solve it?
Thanks in advance,
Patrick