DiaNN icon indicating copy to clipboard operation
DiaNN copied to clipboard

Malformed Library tsv files when Gene ID at the end of fasta description

Open SeereouslyDrewNichols opened this issue 6 months ago • 3 comments

There is an issue w/ DIANN when using the --gen-spec-lib flag to generate a library when the gene ID is at the end of a fasta description. Concretely, if the fasta description for an entry in the fasta file looks like this, where GN=XXXX is the last entry, DIANN picks up the linebreak as well as the GN=XXXX and includes that in the generated library. That results in a malformed tsv file where there is a linebreak in the Genes column. (See attached picture). Would it be possible to patch v1.8.1 and above w/ this fix?

sp|Q29536-2|KPYR_CANLF Isoform L-type of Pyruvate kinase PKLR OS=Canis lupus familiaris OX=9615 GN=PKLR

image

SeereouslyDrewNichols avatar Jul 26 '24 18:07 SeereouslyDrewNichols