tslearn
tslearn copied to clipboard
Parallelized time series k means barycenter update procedure
First of all I want to thank you for the amazing job that you have done creating this python package which is awesome and very useful. I have used it in my final thesis for a master degree in which I had to cluster some type of players based on their behaviors. The problem in using Time series k-means wiht my dataset derived from the fact that the actual implementation performs parallel computations only in distance matrices (DTW in my case),thus i re-wrote the code related to the MM algorithm and to the barycenter update proceudre in the following way: the barceynter update procedure creates a process for each cluster of the algorithm (equal to the value of k) and then each process creates a pool of threads (15 in my implementation), that are used to perform the barycenter update procedure in a parallel and efficient way.
I forgot to mention that also some redundant calls to the functions to_time_series_dataset() were removed because they caused memory allocation problems and they were basically not useful
Hi @enricofallacara , I'm not the maintainer for tslearn but I thought I'd share some comments since this would be useful for me.
- tests are failing since you're importing
psutil
which is not in therequirements.txt
, however you do not appear to usepsutil
. -
n_jobs
for joblib is hard-coded to 15, would probably be better have it as a keyword argument
Hi @enricofallacara , I'm not the maintainer for tslearn but I thought I'd share some comments since this would be useful for me.
- tests are failing since you're importing
psutil
which is not in therequirements.txt
, however you do not appear to usepsutil
.n_jobs
for joblib is hard-coded to 15, would probably be better have it as a keyword argument
You are completly right, sorry for the psutil mistake. I have used it for some stuffs and i forgot to remove it. Also, I know that I had hard-coded the number of threads, but I was in a hurry and I loaded my implementation. Tomorrow I will fix this stuffs, thanks!
Hi @enricofallacara
Your PR is now in conflict with the master branch since we have refactored the code. I can take care of the merge to resolve the conflicts if you want, after which I could do a review of your code.
Hello,
No problem with that. Sorry, I didn't have time to fix the code due to work problems, I really appreciate your help.
Il giorno mar 5 gen 2021 alle ore 11:00 Romain Tavenard < [email protected]> ha scritto:
Hi @enricofallacara https://github.com/enricofallacara
Your PR is now in conflict with the master branch since we have refactored the code. I can take care of the merge to resolve the conflicts if you want, after which I could do a review of your code.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tslearn-team/tslearn/pull/321#issuecomment-754535055, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOARI3KO75OW4YPXDB54JTSYLPNTANCNFSM4U4M42ZQ .
I have left a few comments. Maybe the main point would be to assess the improvement (in terms of running times) that these changes offer on some benchmark.
Also, tests seem to fail at the moment (let me know if it's due to a bug in my merge).