plass icon indicating copy to clipboard operation
plass copied to clipboard

Incorrect headers for Nuc -> Protein

Open ghost opened this issue 5 years ago • 7 comments

Hello,

Recently using this program I had tried

plass nuclassemble reads_1.fastq.gz assembly_testnu.fas tmp

Output ::

>1541_chr1_0_114757654_114757803_7891_JFMU01000067.1 AGCTGGAATTTCTAAAAAAGATATTAATGGCTTTATGATAAGAAAACTAAAGAATATTGAAATAA

However when trying to use

plass assemble reads_1.fastq.gz assembly_testpep.fas tmp

The headers are not there and I am seeing this string

>0 2+146 3 RLAFNSRKAMDNVTLTLELPPNAELTPFPGRQTISWTVDLKQGDNVLALPINVLFPGSGKLVAHLDDGTRRKTFSTAIPGNTEPSS*

Any ideas? Thank you

ghost avatar May 14 '19 15:05 ghost

Ah interesting. The nuclassemble is returning currently the header of the sequences that got extended. While the protein assembler assemble returns the header information from the extracted orf. We will think about a solution how make the header information more consistent.

martin-steinegger avatar May 23 '19 14:05 martin-steinegger

Hi Martin any news on the issue?

Thanks Antonio

genomewalker avatar Sep 12 '19 06:09 genomewalker

@genomewalker we will have a discussion about this issue tomorrow. We will update this issues once we have a solution.

martin-steinegger avatar Sep 18 '19 20:09 martin-steinegger

@genomewalker we agreed on a new header format of <uniq ID> len:<len> cycle:<0|1>. Each header will contain the uniqID and the len field, the cycle field is optional (for nucleotide sequences).

This is already implemented now (8a7d224). So both assembler workflows contain now more consistent information.

We are also planning to extend the header by an additional coverage field at some point, but this is not done yet.

<uniq ID> len:<len> cycle:<0|1> cov:<cov>

AnnSeidel avatar Nov 22 '19 19:11 AnnSeidel

Thank you very much @AnnSeidel! The coverage field will be very useful when added!

Regarding the nucleotide assembly, is it still considered experimental?

genomewalker avatar Nov 23 '19 16:11 genomewalker

We have done some significant further development regarding the nucleotide assembly in the last months, but I would still consider it as experimental...but we are working on it.

AnnSeidel avatar Nov 27 '19 14:11 AnnSeidel

Sorry if this is the wrong place to ask, but I am very curious to know: Has the request to include the coverage information into the header been implemented? In my testing I haven't seen it, but would love to include it in subsequent steps!

ewaltari avatar Aug 25 '23 01:08 ewaltari