phylophlan icon indicating copy to clipboard operation
phylophlan copied to clipboard

where is the log file

Open pavlo888 opened this issue 2 years ago • 3 comments

Dear,

First off, great package!!!!

Now, I managed to successfully run the pipeline with the following command phylophlan -i Selected_genomes_pangenome_20200522 -o output-228-apr2022 -d uniref90_At --trim greedy --not_variant_threshold 0.99 --remove_fragmentary_entries --fragmentary_threshold 0.67 --min_num_entries 135 -t a -f isolates_config.cfg --diversity low --force_nucleotides --nproc 2 --verbose 2>&1 | tee logs/phylophlan__output-228-apr2022.log

However, now I cannot find the log file. Could you please indicate me where I can find it? I would like to extract some information about the pipeline like how big are the concatemers used in the MSAs.

Thanks!

Cheers, Pablo

pavlo888 avatar May 25 '22 01:05 pavlo888

Hello Pablo,

Apologies, but I'm not sure to which log you're referring. You already put the --verbose and saved the output from PhyloPhlAn to the log file logs/phylophlan__output-228-apr2022.log. Within the output folder (output-228-apr2022) you'll find a tmp folder that contains all intermediate steps, so maybe that's what you need if you like to compute some stats about the single MSA?

Many thanks, Francesco

fasnicar avatar May 26 '22 07:05 fasnicar

Hi @fasnicar

Thanks a lot for your reply! I understand a bit better now. I thought that the log would be a single file with information from the run. But now it is clearer for me.

Indeed, I saw the different files from the intermediate steps.

Three follow-up question would be: i) The number of files in the "markers" folder represents the total number of markers detected? ii) In the concatenated.aln.reduced file, the first line says "227 2217968", does that mean that there are 227 genomes and 2 217 968 bp in total for the complete alignment? That would mean that each genome would have 2217968/227=9770.78 bp? iii) In the info.refined.tre file, the "Alignment patterns: 1041319" information, what does it mean exactly?

Thanks a lot for your help!!!!

Cheers, Pablo

pavlo888 avatar May 28 '22 01:05 pavlo888

Hi Pablo,

To answer your questions:

i) The number of files in the "markers" folder represents the total number of markers detected?

Yes, that would be the number of markers detected. Although these might not be the same number of markers used for building the tree. If you specified a trimming approach (as you did in your command with the param --trim greedy), then some markers might be discarded during the trimming phase. The trimming steps are done in folders whose name starts with trim_, so you want to check the latest to get the actual number of markers used in the tree.

ii) In the concatenated.aln.reduced file, the first line says "227 2217968", does that mean that there are 227 genomes and 2 217 968 bp in total for the complete alignment? That would mean that each genome would have 2217968/227=9770.78 bp?

The concatenated.aln.reduced file is produced by RAxML when identical entries are detected. The 2217968 is the MSA length, meaning that each of the 227 genomes has that many positions aligned.

iii) In the info.refined.tre file, the "Alignment patterns: 1041319" information, what does it mean exactly?

Those are the unique patterns that RAxML found in the MSA. This, most of the time can be similar to the alignment length, in your case about half, meaning that there are some patterns that are repeated. You can understand more about these aspects by referring to the RAxML manual.

I hope these help.

Many thanks, Francesco

fasnicar avatar May 31 '22 09:05 fasnicar