EMBLmyGFF3
EMBLmyGFF3 copied to clipboard
Gene sorting compared to fasta
Describe the bug When running EMBLmyGFF3 on a fasta file with a specific order of the genes, it re-orders the genes in the flat file output in what seems to be alphanumerical order based on the gene names. Input gene order is not retained.
General (please complete the following information):
- EMBLmyGFF3 2.2
- EMBLmyGFF3 installation/use: Manual
- OS: macOS and Windows 11
To Reproduce Any fasta+gff Any input parameters
Expected behavior To guarantee the gene order in the flat file output to be in the same order as in the fasta input
Screenshots None
Additional context None
This behavior is related to biopython. Could you tell what version of biopython you use? see https://github.com/NBISweden/EMBLmyGFF3/issues/70
Seems to be 1.80
Chiming in here. We noted the issue of contigs not being sorted according to the input order of the assembly file. The culprit here is not biopython, but rather BCBio.GFF. In EMBLmyGFF.py you have the following. At line line 1365:
seq_dict = SeqIO.to_dict( SeqIO.parse(infasta, "fasta") )
I have verified that the input order is preserved.
Later on, at line 1484:
for record in GFF.parse(infile, base_dict=seq_dict):
is where the input order is scrambled. GFF.parse
calls a function GFF.parse_in_parts
which looks as follows:
def parse_in_parts(self, gff_files, base_dict=None, limit_info=None,
target_lines=None):
"""Parse a region of a GFF file specified, returning info as generated.
target_lines -- The number of lines in the file which should be used
for each partial parse. This should be determined based on available
memory.
"""
for results in self.parse_simple(gff_files, limit_info, target_lines):
if base_dict is None:
cur_dict = dict()
else:
cur_dict = copy.deepcopy(base_dict)
cur_dict = self._results_to_features(cur_dict, results)
all_ids = list(cur_dict.keys())
all_ids.sort()
for cur_id in all_ids:
yield cur_dict[cur_id]
By commenting out the line all_ids.sort
I can preserve input order.
Excellent, thank you for the feedback @percyfal !