mlst_check
mlst_check copied to clipboard
get_sequence_types only writes N for some fasta
From RT 640278: I am running get_sequence_type on a list of assemblies. It calls the STs for all, but for some fastas it only writes N as allele output, in the mlst_results.genomic.csv and in the concatenated_alleles.fa. I have tried renaming the files, changing the fasta header (getting rid of commas), etc, but I see no pattern how this would make any sense why some sequences are given as output and others are not. Could you please take a look if you can see what is going on? If I run e.g.
bsub -M1000 -R 'select[mem>1000] rusage[mem=1000]' -o log_test.o -e log_test.e 'get_sequence_type -s "Escherichia coli 2" -c GCA_001266335.1_400929_genomic.fna'
then it also does not write any errors, but ‘completed successfully’…? The genomes are E. coli, I am sure.
Same here. I run get_sequence_type
for 1108 Helicobacter pylori assemblies, 530 resulted in concatenated all N fasta files. I inspected the assembly information of the strains, found that the error happens when the contig number is low (<20) or not many Ns in the assembly result (< 1000). For example, 25 complete genome assemblies all failed. Eight of them are in the attachment.
complete_hpylori.zip
I have the same error.