Understanding output

Open Ge0rges opened this issue 1 year ago • 1 comments

Hello,

I ran the tool on a FASTA file representing a genome which contains many different contigs. I used python Promotech-master/promotech.py -g -i results/ -o results/ to generate the final results.

In the results folder I look at genome_predictions.csv, in the column chrome I get a list of every contig separated by | and then the sequence is quite long.

Does the sequence represent the entire promoter then, meaning the next nucleotide is the start codon? How can I identify which contig the entry actually belongs to?

Thanks

Oct 31 '24 02:10 Ge0rges

So I think I figured out that the contigs get concatenated so a sliding window can work (though in the case of a MAG that doesn't really make sense). However, many sequences don't exist in the FASTA file (forward strand sequences).

The highest forward strand sequence I could find in the FASTA which scored 91% had a start codon about 50 nucleotides away. Which I guess was a little unexpected. Does this track with results seen?

Oct 31 '24 06:10 Ge0rges