clinker icon indicating copy to clipboard operation
clinker copied to clipboard

some issues for -mo --matrix_out parameter

Open Yuki0902 opened this issue 1 year ago • 2 comments

I would like to understand the specific implications of the results obtained through the '-mo' parameter. The CSV file exported using the '-mo' parameter uses the full gene name as both the horizontal and vertical axes. Therefore, I assume it represents the similarity results for entire genes. However, it seems that the similarity results from the '-o' parameter, which exports alignments, may not have a direct correlation with the results obtained through the '-mo' parameter.

Yuki0902 avatar Oct 18 '23 14:10 Yuki0902

Hi @Yuki0902,

The --mo parameter gives you the similarity scores between the input clusters (i.e. calculated from all gene alignments between two clusters). This is the matrix that is used in the clustering step which determines the display ordering of the clusters in the output. The names are taken from the input cluster files. For instance, running clinker on the files in the examples folder gives the following:

,A. alliaceus CBS 536.65,A. burnettii MST-FP2249,A. mulundensis DSM 5745,A. versicolor CBS 583.65,P. vexata CBS 129021
A. alliaceus CBS 536.65,0.0,0.0,0.22350406073456497,0.3042042558254481,0.6034166451441612
A. burnettii MST-FP2249,0.0,0.0,0.23137351943160522,0.3408531051603084,0.6204295216720592
A. mulundensis DSM 5745,0.22350406073456497,0.23137351943160522,0.0,0.35355579358610445,0.6219765052472899
A. versicolor CBS 583.65,0.3042042558254481,0.3408531051603084,0.35355579358610445,0.0,0.6141102008013652
P. vexata CBS 129021,0.6034166451441612,0.6204295216720592,0.6219765052472899,0.6141102008013652,0.0

gamcil avatar Oct 25 '23 02:10 gamcil

Thank you for your response, which has provided me with more insights into this parameter :)!. However, I still have some doubts regarding the calculations with the --mo parameter. After running the calculations on the gbk files in the 'examples' folder, I noticed that the similarity between A. burnettii MST-FP2249 and A. alliaceus CBS 536.65 appears to be zero when using the --mo parameter. On the other hand, the results with the --p parameter indicate that the similarity between A. burnettii MST-FP2249 and A. alliaceus CBS 536.65 for different functional genes is mostly above 0.7. I'm unsure how to interpret the differences between these two sets of results.

Yuki0902 avatar Oct 30 '23 13:10 Yuki0902