tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

Parallelized time series k means barycenter update procedure

Open enricofallacara opened this issue 4 years ago • 7 comments

First of all I want to thank you for the amazing job that you have done creating this python package which is awesome and very useful. I have used it in my final thesis for a master degree in which I had to cluster some type of players based on their behaviors. The problem in using Time series k-means wiht my dataset derived from the fact that the actual implementation performs parallel computations only in distance matrices (DTW in my case),thus i re-wrote the code related to the MM algorithm and to the barycenter update proceudre in the following way: the barceynter update procedure creates a process for each cluster of the algorithm (equal to the value of k) and then each process creates a pool of threads (15 in my implementation), that are used to perform the barycenter update procedure in a parallel and efficient way.

enricofallacara avatar Dec 15 '20 14:12 enricofallacara

I forgot to mention that also some redundant calls to the functions to_time_series_dataset() were removed because they caused memory allocation problems and they were basically not useful

enricofallacara avatar Dec 15 '20 14:12 enricofallacara

Hi @enricofallacara , I'm not the maintainer for tslearn but I thought I'd share some comments since this would be useful for me.

  1. tests are failing since you're importing psutil which is not in the requirements.txt, however you do not appear to use psutil.
  2. n_jobs for joblib is hard-coded to 15, would probably be better have it as a keyword argument

kushalkolar avatar Dec 15 '20 15:12 kushalkolar

Hi @enricofallacara , I'm not the maintainer for tslearn but I thought I'd share some comments since this would be useful for me.

  1. tests are failing since you're importing psutil which is not in the requirements.txt, however you do not appear to use psutil.
  2. n_jobs for joblib is hard-coded to 15, would probably be better have it as a keyword argument

You are completly right, sorry for the psutil mistake. I have used it for some stuffs and i forgot to remove it. Also, I know that I had hard-coded the number of threads, but I was in a hurry and I loaded my implementation. Tomorrow I will fix this stuffs, thanks!

enricofallacara avatar Dec 15 '20 20:12 enricofallacara

Hi @enricofallacara

Your PR is now in conflict with the master branch since we have refactored the code. I can take care of the merge to resolve the conflicts if you want, after which I could do a review of your code.

rtavenar avatar Jan 05 '21 10:01 rtavenar

Hello,

No problem with that. Sorry, I didn't have time to fix the code due to work problems, I really appreciate your help.

Il giorno mar 5 gen 2021 alle ore 11:00 Romain Tavenard < [email protected]> ha scritto:

Hi @enricofallacara https://github.com/enricofallacara

Your PR is now in conflict with the master branch since we have refactored the code. I can take care of the merge to resolve the conflicts if you want, after which I could do a review of your code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tslearn-team/tslearn/pull/321#issuecomment-754535055, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOARI3KO75OW4YPXDB54JTSYLPNTANCNFSM4U4M42ZQ .

enricofallacara avatar Jan 05 '21 10:01 enricofallacara

I have left a few comments. Maybe the main point would be to assess the improvement (in terms of running times) that these changes offer on some benchmark.

rtavenar avatar Jan 05 '21 11:01 rtavenar

Also, tests seem to fail at the moment (let me know if it's due to a bug in my merge).

rtavenar avatar Jan 05 '21 11:01 rtavenar