DiaNN
DiaNN copied to clipboard
Malformed Library tsv files when Gene ID at the end of fasta description
There is an issue w/ DIANN when using the --gen-spec-lib flag to generate a library when the gene ID is at the end of a fasta description. Concretely, if the fasta description for an entry in the fasta file looks like this, where GN=XXXX is the last entry, DIANN picks up the linebreak as well as the GN=XXXX and includes that in the generated library. That results in a malformed tsv file where there is a linebreak in the Genes column. (See attached picture). Would it be possible to patch v1.8.1 and above w/ this fix?
sp|Q29536-2|KPYR_CANLF Isoform L-type of Pyruvate kinase PKLR OS=Canis lupus familiaris OX=9615 GN=PKLR