clinker
clinker copied to clipboard
some issues for -mo --matrix_out parameter
I would like to understand the specific implications of the results obtained through the '-mo' parameter. The CSV file exported using the '-mo' parameter uses the full gene name as both the horizontal and vertical axes. Therefore, I assume it represents the similarity results for entire genes. However, it seems that the similarity results from the '-o' parameter, which exports alignments, may not have a direct correlation with the results obtained through the '-mo' parameter.
Hi @Yuki0902,
The --mo
parameter gives you the similarity scores between the input clusters (i.e. calculated from all gene alignments between two clusters). This is the matrix that is used in the clustering step which determines the display ordering of the clusters in the output. The names are taken from the input cluster files. For instance, running clinker on the files in the examples
folder gives the following:
,A. alliaceus CBS 536.65,A. burnettii MST-FP2249,A. mulundensis DSM 5745,A. versicolor CBS 583.65,P. vexata CBS 129021
A. alliaceus CBS 536.65,0.0,0.0,0.22350406073456497,0.3042042558254481,0.6034166451441612
A. burnettii MST-FP2249,0.0,0.0,0.23137351943160522,0.3408531051603084,0.6204295216720592
A. mulundensis DSM 5745,0.22350406073456497,0.23137351943160522,0.0,0.35355579358610445,0.6219765052472899
A. versicolor CBS 583.65,0.3042042558254481,0.3408531051603084,0.35355579358610445,0.0,0.6141102008013652
P. vexata CBS 129021,0.6034166451441612,0.6204295216720592,0.6219765052472899,0.6141102008013652,0.0
Thank you for your response, which has provided me with more insights into this parameter :)!. However, I still have some doubts regarding the calculations with the --mo parameter. After running the calculations on the gbk files in the 'examples' folder, I noticed that the similarity between A. burnettii MST-FP2249 and A. alliaceus CBS 536.65 appears to be zero when using the --mo parameter. On the other hand, the results with the --p parameter indicate that the similarity between A. burnettii MST-FP2249 and A. alliaceus CBS 536.65 for different functional genes is mostly above 0.7. I'm unsure how to interpret the differences between these two sets of results.