NOVOPlasty icon indicating copy to clipboard operation
NOVOPlasty copied to clipboard

Test data for heteroplasmy

Open stephanflemming opened this issue 4 years ago • 5 comments

Hi,

I am working on the integration of NOVOPlasty into galaxy. Where can I find the files used in the Heteroplasmy test runs in this repository, e.g. config_ERR1395547.txt

Forward reads         = Filtered_reads_ERR1395547_R1.fastq
Reverse reads         = Filtered_reads_ERR1395547_R2.fastq 

Thank you, Stephan

stephanflemming avatar May 26 '20 00:05 stephanflemming

Hi,

Those files are not available on my github. The original files are on ENA or NCBI, I could write down how to reproduce them.. I will also upload a new version 4.0 with significant improvements

Greets,

Nicolas

ndierckx avatar May 26 '20 21:05 ndierckx

Hi,

I uploaded 4.0, are you interested in more elaborated manual for the heteroplasmy analysis of that sample? I could add it to the wiki

ndierckx avatar Jun 03 '20 00:06 ndierckx

Great, I will have a look! Thank you.

stephanflemming avatar Jun 13 '20 09:06 stephanflemming

Hi, I have some questions and a couple of remarks :-)

In the wiki an output file Contigs_project.txt is mentioned. As far as I can see, Contigs_1_project.fasta is created. Does the "1" represent some kind of counter and files with higher number could be produced? Same for Circularized_assembly_project.fasta / Circularized_assembly_1_project.fasta and Uncircularized_assemblies_1_project.fasta (which is not mentioned in the wiki btw.)

I couldn't produce the result files Merged_contigs_project.txt, Option_nr_project.txt,, Possible_NUMTs_project.vcf, Possible_NUMTs_assemblies_project.fasta and Linkage_table_NUMTs_project.txt. Can you recommend a dataset for this? I just want to see if the wrapper works.

It seems that Platform: SE has an influence on the applied value of Insert size. How are these two parameters connected?

seed_input doesn't accept fasta.gz files, while chloroplast, forward, reverse, combined and reference allow that.

Setting a value for Heteroplasmy doesn't have an effect, MAF needs to be set instead. The description is a bit confusing here.

Just for clarification, when using Heteroplasmy mode are Assembly results (contigs, contigs tmp, ...) produced?

Hence a lot of result files are possibly created, which of them should be shown as defaults? Does it make sense in your opinion to hide some by default?

There are two typos in the README: "beeen", " size is know".

Assembled_reads_Result_R2.fasta and Assembled_reads_Result_R1.fasta are not mentioned in the wiki

Thank you! Stephan

stephanflemming avatar Aug 05 '20 16:08 stephanflemming

Hi,

Sorry was on a holiday and didn't had the time after. Thanks for remarks, will try to fix some of the issues as soon as possible.

  • I will have a look if I can clean up the output files a bit, because they are indeed not that clear (some are redundant) Will do it this week

  • Merged_contigs_project.txt, Option_nr_project.txt: Will send you a config setting that will produce these files Possible_NUMTs_project.vcf, Possible_NUMTs_assemblies_project.fasta and Linkage_table_NUMTs_project.txt: For these files I will have to have a better look because they are only produced when NUMT sequences are assembled

  • Platform: SE : The insert size has no importance when this setting is used, but it is automatically set to 2*read_length

  • seed_input doesn't accept it indeed, I could add it but don't think anyone would use it (seeds should be short sequences (<1kbp)

  • "Setting a value for Heteroplasmy doesn't have an effect," : Indeed, need to change this in the wiki

  • "Just for clarification, when using Heteroplasmy mode are Assembly results (contigs, contigs tmp, ...) produced?" No only assemblies around the detected SNPs (also not around indels, maybe will update this)

Greets,

Nicolas

ndierckx avatar Aug 27 '20 01:08 ndierckx