simpsom
simpsom copied to clipboard
net.project() function is slow
Hey @fcomitani team,
I basically use your library for clustering but there's one function which takes hell lot of time. My code is like below:
x_train=df_kmeans.drop(columns=['LTV','sqrt_LTV'])
net = sps.somNet(30, 30, x_train.values, PBC=True)
net.train(0.1, 20000)
prj=np.array(net.project(x_train.values))
This (prj=np.array(net.project(x_train.values))) line of code takes around 6-7 hours for around 7 million rows. Can you help me out that how I can faster this one out. My current system is 32 GB RAM and 4 core CPU in AWS.
Hi @ashsharma96,
thank you for bringing this up. Yes, I am unfortunately aware of the performance issues of the current implementation. There isn't much you can do at the moment to improve that unless you subset your input dataset and run it in parallel batches yourself.
We are currently working on a new implementation which will improve considerably the overall performance of the package (both in GPU and CPU). We are aiming to publish it in the coming weeks. Hopefully that will help with your issue.
@fcomitani Thank you so much for answering my query. I'll be looking forward to that new implementation. Please let us know whenever this new update arrives.
@fcomitani Hope you are doing fine . Any Updates on this one? Regards
Hi @ashsharma96 apologies for the radio silence.
Yes, we've been working on a new (hopefully more efficient) version, v3.0.0. We don't have a date for the official release yet, it should be out in the next couple of months, we still have a few kinks to iron out.
If you feel like trying it you can pull it from the main branch and give it a spin. If you decide to do so please make sure to report any bugs or issues you encounter and beware that part of the API has changed. You can find new docs here.
Feedback is always appreciated! Thanks.
The beta release is out.