NOVOPlasty
NOVOPlasty copied to clipboard
Test data for heteroplasmy
Hi,
I am working on the integration of NOVOPlasty into galaxy. Where can I find the files used in the Heteroplasmy test runs in this repository, e.g. config_ERR1395547.txt
Forward reads = Filtered_reads_ERR1395547_R1.fastq
Reverse reads = Filtered_reads_ERR1395547_R2.fastq
Thank you, Stephan
Hi,
Those files are not available on my github. The original files are on ENA or NCBI, I could write down how to reproduce them.. I will also upload a new version 4.0 with significant improvements
Greets,
Nicolas
Hi,
I uploaded 4.0, are you interested in more elaborated manual for the heteroplasmy analysis of that sample? I could add it to the wiki
Great, I will have a look! Thank you.
Hi, I have some questions and a couple of remarks :-)
In the wiki an output file Contigs_project.txt
is mentioned. As far as I can see, Contigs_1_project.fasta
is created. Does the "1" represent some kind of counter and files with higher number could be produced? Same for Circularized_assembly_project.fasta
/ Circularized_assembly_1_project.fasta
and Uncircularized_assemblies_1_project.fasta
(which is not mentioned in the wiki btw.)
I couldn't produce the result files Merged_contigs_project.txt
, Option_nr_project.txt,
, Possible_NUMTs_project.vcf
, Possible_NUMTs_assemblies_project.fasta
and Linkage_table_NUMTs_project.txt
. Can you recommend a dataset for this? I just want to see if the wrapper works.
It seems that Platform: SE
has an influence on the applied value of Insert size
. How are these two parameters connected?
seed_input
doesn't accept fasta.gz
files, while chloroplast
, forward
, reverse
, combined
and reference
allow that.
Setting a value for Heteroplasmy
doesn't have an effect, MAF
needs to be set instead. The description is a bit confusing here.
Just for clarification, when using Heteroplasmy
mode are Assembly
results (contigs, contigs tmp, ...) produced?
Hence a lot of result files are possibly created, which of them should be shown as defaults? Does it make sense in your opinion to hide some by default?
There are two typos in the README: "beeen", " size is know".
Assembled_reads_Result_R2.fasta
and Assembled_reads_Result_R1.fasta
are not mentioned in the wiki
Thank you! Stephan
Hi,
Sorry was on a holiday and didn't had the time after. Thanks for remarks, will try to fix some of the issues as soon as possible.
-
I will have a look if I can clean up the output files a bit, because they are indeed not that clear (some are redundant) Will do it this week
-
Merged_contigs_project.txt, Option_nr_project.txt: Will send you a config setting that will produce these files Possible_NUMTs_project.vcf, Possible_NUMTs_assemblies_project.fasta and Linkage_table_NUMTs_project.txt: For these files I will have to have a better look because they are only produced when NUMT sequences are assembled
-
Platform: SE : The insert size has no importance when this setting is used, but it is automatically set to 2*read_length
-
seed_input doesn't accept it indeed, I could add it but don't think anyone would use it (seeds should be short sequences (<1kbp)
-
"Setting a value for Heteroplasmy doesn't have an effect," : Indeed, need to change this in the wiki
-
"Just for clarification, when using Heteroplasmy mode are Assembly results (contigs, contigs tmp, ...) produced?" No only assemblies around the detected SNPs (also not around indels, maybe will update this)
Greets,
Nicolas