clust
clust copied to clipboard
drastically different results between version 1.8.10 and version 1.12.0
I reanalyzed some data I originally analyzed with version 1.8.10 and am not able to recapitulate those results with version 1.12.0. I see a warning message at the 80% point of step 3 of the analysis in 1.12.0 but it doesn't seem relevant to my issue or to clust: "FutureWarning: 'n_jobs' was deprecated in version 0.23 and will be removed in 0.25."
Do you have a sense of what might be leading to the different results? My implementation of clust is straightforward:
clust input_file -o output_dir
Thanks, Elizabeth
I just noticed a similar behavior between version 1.10 and 1.12.
The exact same input data produce totally different results.
Version 1.10 produces 6 clusters from 299 proteins while version 1.12 only finds 1 cluster.
I would really appreciate any input!
Let me know if you require any additional information.
Best, Miguel
Result summary, version 1.10
PS C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_analysis> clust .\clust_data\
/===========================================================================\
| Clust |
| (Optimised consensus clustering of multiple heterogenous datasets) |
| Python package version 1.10.10 (2019) Basel Abu-Jamous |
+---------------------------------------------------------------------------+
| Analysis started at: Friday 04 December 2020 (09:46:55) |
| 1. Reading dataset(s) |
| 2. Data pre-processing |
C:\Program Files\WPy64-3770\python-3.7.7.amd64\lib\site-packages\clust\scripts\preprocess_data.py:19: RuntimeWarning: invalid value encountered in greater
I = np.bitwise_and(~isnan(X), X>0)
| - Automatic normalisation mode (default in v1.7.0+). |
| Clust automatically normalises your dataset(s). |
| To switch it off, use the `-n 0` option (not recommended). |
| Check https://github.com/BaselAbujamous/clust for details. |
| - Flat expression profiles filtered out (default in v1.7.0+). |
| To switch it off, use the --no-fil-flat option (not recommended). |
| Check https://github.com/BaselAbujamous/clust for details. | |
| C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_a |
| nalysis/Results_04_Dec_20 |
+---------------------------------------------------------------------------+
| Analysis finished at: Friday 04 December 2020 (09:46:59) |
| Total time consumed: 0 hours, 0 minutes, and 4 seconds |
| |
\===========================================================================/
/===========================================================================\
| RESULTS SUMMARY |
+---------------------------------------------------------------------------+
| Clust received 1 dataset with 299 unique genes. After filtering, 299 |
| genes made it to the clustering step. Clust generated 6 clusters of |
| genes, which in total include 205 genes. The smallest cluster includes |
| 13 genes, the largest cluster includes 67 genes, and the average cluster |
| size is 34 genes. |
================================/
Result summary on the same data, Clust version 1.12:
PS C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_analysis> clust .\clust_data\
/===========================================================================\
| Clust |
| (Optimised consensus clustering of multiple heterogenous datasets) |
| Python package version 1.12.0 (2019) Basel Abu-Jamous |
+---------------------------------------------------------------------------+
| Analysis started at: Friday 04 December 2020 (10:15:55) |
| 1. Reading dataset(s) |
| 2. Data pre-processing |
C:\Users\migue\AppData\Roaming\Python\Python37\site-packages\clust\scripts\preprocess_data.py:19: RuntimeWarning: invalid value encountered in greater
I = np.bitwise_and(~isnan(X), X>0)
| - Automatic normalisation mode (default in v1.7.0+). |
| Clust automatically normalises your dataset(s). |
| To switch it off, use the `-n 0` option (not recommended). |
| Check https://github.com/BaselAbujamous/clust for details. |
| - Flat expression profiles filtered out (default in v1.7.0+). |
| To switch it off, use the --no-fil-flat option (not recommended). |
| Check https://github.com/BaselAbujamous/clust for details. |
| 3. Seed clusters production (the Bi-CoPaM method) |
| C:\Users\migue\OneDrive\Documentos\R_Projects\Support\Patrick_Pigs_data_a |
| nalysis/Results_04_Dec_20_1 |
+---------------------------------------------------------------------------+
| Analysis finished at: Friday 04 December 2020 (10:15:59) |
| Total time consumed: 0 hours, 0 minutes, and 4 seconds |
| |
\===========================================================================/
/===========================================================================\
| RESULTS SUMMARY |
+---------------------------------------------------------------------------+
| Clust received 1 dataset with 299 unique genes. After filtering, 299 |
| genes made it to the clustering step. Clust generated 1 clusters of |
| genes, which in total include 67 genes. The smallest cluster includes 67 |
| genes, the largest cluster includes 67 genes, and the average cluster |
| size is 67 genes. |
\===========================================================================/
Just chiming in that I'm experience similar replication issues between version 1.10 and 1.12, a drastic reduction in the number of clusters.
I also had similar issue, v1.12 gets only a single cluster in two different datasets.