pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

[FEA] More intuitive scale factor

Open lmeyerov opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe.

I repeatedly struggled to get umap(scale=..., ) to return > 0 edge, and when digging in, saw the weight filter is based on:

https://github.com/graphistry/pygraphistry/blob/47f92905d3b65f3b01ac10b7cab4448133de0816/graphistry/feature_utils.py#L776

That's 1-sided and in the opposite direction than I expected, so increasing scale factor actually decreased chance of passing!

I'd end up sending fractional, 0.1 etc to get closer

Describe the solution you'd like

Idea 1: Go two-sided -- edges with weights within scale * std of the mean, not just scale * std greater

wdf2 = wdf[wdf[config.WEIGHT] >= mean - scale * std]

I think that's the typical case, so umap(1) means less than 1 std or even stronger, instead of greater than 1 std in strength

Idea 2:

Flipping it around, there might be a sense of number of intended edges, or even a supervised edge set, to make this a bit more automatic & declarative, vs playing with magic constants (from the user's perspective)

Describe alternatives you've considered

User passes in negative values :)

More docs here

cc + @silkspace

lmeyerov avatar Feb 22 '22 03:02 lmeyerov

I've added better pruning of implicit edges and a filter_edges function that allows pruning after the initial g.umap(..) call. The new logic means that scale=0 returns the highest weighted edges, and larger values includes more edges.

silkspace avatar Mar 03 '22 23:03 silkspace

As part of closing this out, I think we should make sure the docs are good + include extreme/useful cases as part of the intro or a standalone ipynb. I still don't find this intuitive as I have to keep remembering and re-deciphering the code!

lmeyerov avatar Mar 26 '22 19:03 lmeyerov