paleomix
paleomix copied to clipboard
documentation for paloemix phylo pipeline
Dear Mikkel,
After successfully using your fantastic paloemix bam_pipeline to generate bam files of my genomes, I am trying to used the phylo pipeline. However, I am struggling to make it run because I am unsure on how to set the makefile.yaml script properly (e.g. I want a genome-wie analysis, and not in specific regions, so I don't know what to write on prefix). Do you have perhaps extended documentation on the phylo package (it says is under construction on readthedocs file), or perhaps a makefile.yaml script of reference for the phylo pipeline?
Many thanks in advance!
Oscar
Dear Oscar,
I apologize for the state of the documentation for the phylogenetic pipeline. I do intend to remidy this, along with a rework it to make it easier to use overall.
To answer your question, there is currently no way to run the phylogenetic pipeline without a set of targets regions. So if you want to analyse the whole genome, then you simply need to creat bed file that covers the whole genome. A simple way to do this, is to generate a BED file from the FASTA index file (.fai), like so:
$ awk '{print $1, 0, $2}' OFS='\t' rCRS.fasta.fai > whole_genome.bed
You either need to place this file in data/regions/whole_genome.bed
in the folder where you want to run the pipeline or specify a different folder with --regions-root
.
The Prefix
is the name of your genome/FASTA file, but without the .fasta
. In the above example, that would be rCRS
. The FASTA file should be placed in ./data/prefixes/rCRS.fasta
. This folder can be changed with --prefix-root.
If we ignore the other options, then the makefile would look like this:
RegionsOfInterest:
whole_genome:
Prefix: rCRS
This is basically the same as the example project included with the pipeline. Looking at the example project might give you a better idea of how the pipeline should be setup.
Best regards, Mikkel