OrthoFinder
OrthoFinder copied to clipboard
Question_outputs_Pangenome
Hello.
My name is Cristopher Yerena. I used Orthofinder with my samples. I had a directory with many outputs files. My goal is to analyze Pangenome. I checked many articles from many groups that used Orthofinder.
I would to know if you can guide me which outputs files can I use to "start" analyze Pan-genome in order to know the core, accesory and unique genes from my samples.
Thanks
Hi Cristopher,
I would suggest that you start with the N0.tsv file (in the Phylogenetic_Hierarchical_Orthogroups folder)
It has a row for each orthogroup, and columns for each sample.
A good starting point might be to look at orthogroups which have at least one gene in all of the samples you have analysed. Genes in these orthogroups might be considered core genes, as they have orthologes in all of the samples you are analysing.
Depending on how many samples you have, you might want to also include orthogroups that are present in almost all samples (maybe >90%)
Hope this helps!
Laurie
Hi Laurie,
Thanks for the answer. Before you close the issue, I have another question.
I have checked the other files showing the orthogroups. I would to know if I can infer or think that the numeration of the orthogroups (e.g. OG0000000, OG0000001, OG0000002) has a relation with the "conservation" of genes or is merely a number to order the columns?
Thank
Hi Cristopher,
There is no meaning to the numeration of the orthogroups, I think they are just ordered so that the largest orthogroups are at the top
Laurie
Hello Laurie
Thanks for the advice.
I am trying to analyze my results from Species Tree directory. My question is:
From one dataset some files were generated into the Species_Tree directory (SpeciesTree_rooted_node_labels.txt SpeciesTree_rooted.tbi SpeciesTree_rooted.txt) but from another dataset were produced some files (Potential_Rooted_Species_Trees/ SpeciesTree_rooted_node_labels.txt SpeciesTree_rooted.txt).
Could you help me how to the explain this differences?
Thanks
Hi Christopher
The STRIDE algorithm is used to find the root of the species tree by using gene duplication events. Sometimes there are not enough to exclude all possible alternative roots and so multiple options for the root of the tree remain. I would recommend looking at the species tree and identifying yourself what the root of the tree is from your knowledge of the species. You should be able to find information online about species trees and roots. Once you have determined the correct root, you can run the final step of orthofinder again using the '-ft' option and the '-s' option to provide the correctly rooted species tree. This will recompute the orthologs and HOGs using this info, there will likely be a small number of changes as a result.
Best wishes David