Understanding output
Hello,
I ran the tool on a FASTA file representing a genome which contains many different contigs. I used python Promotech-master/promotech.py -g -i results/ -o results/ to generate the final results.
In the results folder I look at genome_predictions.csv, in the column chrome I get a list of every contig separated by | and then the sequence is quite long.
Does the sequence represent the entire promoter then, meaning the next nucleotide is the start codon?
How can I identify which contig the entry actually belongs to?
Thanks
So I think I figured out that the contigs get concatenated so a sliding window can work (though in the case of a MAG that doesn't really make sense). However, many sequences don't exist in the FASTA file (forward strand sequences).
The highest forward strand sequence I could find in the FASTA which scored 91% had a start codon about 50 nucleotides away. Which I guess was a little unexpected. Does this track with results seen?