Cogent icon indicating copy to clipboard operation
Cogent copied to clipboard

First sequence in unassigned.list is formated wrong for get_seqs_from_list.py

Open LionOfComarre opened this issue 3 years ago • 0 comments

Since the command used to generate the unassigned.list file in the tutorial simply divides the last line of prefix.partition.txt on the commas, the first entry retains the "#unassigned:" text before the sequence name, and is therefore not included in unassigned.fasta.

As has been mentioned elsewhere as well, the command also works incorrectly when using the final.partition.txt from using the large dataset instructions (issue Getting unassigned sequence IDs #80), which led to a host of unassigned sequences not being included in my pseudogenome previously when running cogent on a large dataset, which is probably why some of my sequences did not map correctly to the resulting pseudogenome, but I was luckily still able to retrieve them using get_bad_or_unmapped_hq.py.

LionOfComarre avatar Mar 17 '21 18:03 LionOfComarre