graspologic icon indicating copy to clipboard operation
graspologic copied to clipboard

Improve scaling on `simulations.sbm`

Open loftusa opened this issue 4 years ago • 5 comments

It's currently taking me ~20 seconds to make an SBM with 1500 nodes with

simulations.sbm[500, 500, 500], B, return_labels=True

where B is a block probability matrix.
This scales poorly, especially for experiments in which we need to generate lots of large SBMs.
Speeding up this simulation to scale better would be super nice. Looks like the issue is in the loop here.

I wonder if it's possible to vectorize to speed this loop up? or possibly do some calculations differently, or even (if other options don't work) parralelize?

another option is to use cython here, but not sure how worth it that is... sklearn uses the crap out of cython to speed up their stuff though, and I'm sure it'd be useful in other places besides just this function? thoughts?

EDIT: Now it's going way faster, so maybe the problem was just on my machine. Maybe close this issue?

loftusa avatar Feb 25 '21 21:02 loftusa

a few thoughts:

  • networkx has an SBM sampler, have never tried to compare them
  • if the SBM is sparse, we aren't leveraging that at all here. I have never looked into it much but i know of at least one paper all about it https://www.jmlr.org/papers/volume19/17-128/17-128.pdf

bdpedigo avatar Feb 25 '21 21:02 bdpedigo

@loftusa does this issue needs work?

rajpratyush avatar Mar 16 '21 05:03 rajpratyush

@rajpratyush Maybe - I think the issue was primarily just with my machine at the time, but I'm sure we could still find a way to speed simulation generation up. I'll leave it to @bdpedigo to decide if it's worth pursuing. #734 might be a fun project for you too, but again, not sure at the moment whether or not it's a priority -- @bdpedigo thoughts?

loftusa avatar Mar 17 '21 14:03 loftusa

@loftusa I was going through the file that you mentioned few days back and my first impression was it is evidently will be slow in execution because of too many for loops. It definitely needs to optimized.

rajpratyush avatar Mar 17 '21 15:03 rajpratyush

@rajpratyush Maybe - I think the issue was primarily just with my machine at the time, but I'm sure we could still find a way to speed simulation generation up. I'll leave it to @bdpedigo to decide if it's worth pursuing. #734 might be a fun project for you too, but again, not sure at the moment whether or not it's a priority -- @bdpedigo thoughts?

If someone can generate convincing evidence that another sampling method is faster and the results are still accurate, then I'm happy to have it, though I can't commit to reviewing anything right now.

bdpedigo avatar Mar 17 '21 21:03 bdpedigo