pygraphistry
pygraphistry copied to clipboard
[FEA] More intuitive scale factor
Is your feature request related to a problem? Please describe.
I repeatedly struggled to get umap(scale=..., )
to return > 0 edge, and when digging in, saw the weight filter is based on:
https://github.com/graphistry/pygraphistry/blob/47f92905d3b65f3b01ac10b7cab4448133de0816/graphistry/feature_utils.py#L776
That's 1-sided and in the opposite direction than I expected, so increasing scale factor actually decreased chance of passing!
I'd end up sending fractional, 0.1
etc to get closer
Describe the solution you'd like
Idea 1: Go two-sided -- edges with weights within scale * std
of the mean
, not just scale * std
greater
wdf2 = wdf[wdf[config.WEIGHT] >= mean - scale * std]
I think that's the typical case, so umap(1)
means less than 1 std or even stronger, instead of greater than 1 std in strength
Idea 2:
Flipping it around, there might be a sense of number of intended edges, or even a supervised edge set, to make this a bit more automatic & declarative, vs playing with magic constants (from the user's perspective)
Describe alternatives you've considered
User passes in negative values :)
More docs here
cc + @silkspace
I've added better pruning of implicit edges and a filter_edges
function that allows pruning after the initial g.umap(..)
call. The new logic means that scale=0
returns the highest weighted edges, and larger values includes more edges.
As part of closing this out, I think we should make sure the docs are good + include extreme/useful cases as part of the intro or a standalone ipynb. I still don't find this intuitive as I have to keep remembering and re-deciphering the code!