nxviz icon indicating copy to clipboard operation
nxviz copied to clipboard

Uniform types in set_nodecolors

Open leotrs opened this issue 9 years ago • 12 comments

Here is an excerpt the docstring provided for BasePlot.set_nodecolors:

"""
If `nodecolors` is a `string`, all nodes will carry that color.
If `nodecolors` is a `list` or `tuple`, then nodes will be coloured in
order by the list or tuple elements.
If `nodecolors` is a `dict`, then the keys have to be all present in
the nodelist.
By default, node color is blue.
"""

Then, the code is filled with asserts according to the mentioned rules.

This approach is fine, but I'm not a huge fan of overloading a function and then forking its behavior based on its arguments. I have some experience with Mathematica, where this practice is used to the extent that built-in functions change behavior not only on the type of their arguments but also on their shape. Debugging them is not a very fun exercise.


Here is what hypothetical calls to set_nodecolors would look like:

plot = BasePlot(nodes=range(6))

# string: all the same
plot.set_nodecolors('blue')

# iterable: must preserve order
plot.set_nodecolors(['blue', 'blue', 'blue', 'blue', 'black', 'blue'])

# dict: control over each value
plot.set_nodecolors({0: 'blue', 1: 'blue', 2: 'blue',
                     3: 'blue', 4: 'black', 5: 'blue'})

What I propose is to homogenize the calls and always using dicts:

nodes = range(6)
plot = BasePlot(nodes)

# dict: setting all nodes to the same value is still easy
plot.set_nodecolors({node: 'blue' for node in nodes})

# no iterables!
# user doesn't worry about correct ordering

# dict: control over each value
plot.set_nodecolors({0: 'blue', 1: 'blue', 2: 'blue',
                     3: 'blue', 4: 'black', 5: 'blue'})

The main benefits of this approach is that the code is much shorter and less error-prone, as there are far fewer type-checks and forks that need to be performed at each call (do we only check the type of the argument, or also the length? What about the validity of each color string?) Also, the user doesn't need to worry about the order of iterables being the same as the nodecolors object they are passing.

Right now, even if only one string is being passed, the method ultimately performs self.nodecolors = [nodecolors] * len(self.nodes), which means passing a dict of size equal to the nodelist even in the simplest case will not incur in more memory cost.

I know everything is yet in early stages, but I thought standardizing function signatures is an important design decision to be made from the start.

leotrs avatar Jul 30 '16 00:07 leotrs

Is there an interest in this? Maybe it's too early in development to think about this issues.

leotrs avatar Aug 14 '16 13:08 leotrs

@jonchar is our resident matplotlib guru - what's your thoughts?

ericmjl avatar Aug 14 '16 23:08 ericmjl

by the way, @leotrs, now that I've found some time to look carefully at what you're proposing, it's a great idea. would you like to start a PR + tests for this?

ericmjl avatar Aug 15 '16 23:08 ericmjl

@ericmjl This came up in the course of reviewing the code to prepare a PR.

Do we want to keep the set_* functions, or do we want (more pythonic) properties?

leotrs avatar Aug 16 '16 22:08 leotrs

If you can get it done within the PR and the examples run correctly, then I've no problems with changing to properties.

ericmjl avatar Aug 17 '16 00:08 ericmjl

Just a few thoughts:

  1. It is standard in the matplotlib API to see plotting functions that take kwargs of different types to set attributes of the plot. See the the c argument in plt.scatter for an example of this and the source for how it's handled. Under the hood (in draw_nodes in circos.py) we manually add patches to the axes object but I was thinking perhaps we could use plt.scatter instead to leverage this functionality.
  2. I think having set_nodecolors and set_edgecolors behave in this fashion (with their corresponding kwargs) would keep things streamlined with the matplotlib API. This would also keep a low barrier to customization if you already know matplotlib. If we can offload handling these parameters to existing matplotlib functions as well, that would be advantageous.
  3. The set_* methods are currently called internally when the plot is created. I'm not sure of a use case where they would need to be called after the plot is already created a-la BasePlot.set_*()
  4. Since matplotlib is our "backend" so-to-speak, any valid matplotlib color would be valid in nxviz.

jonchar avatar Aug 17 '16 03:08 jonchar

  1. If we do follow matplotlib standards, we have to think about how this allows or forbids us to follow the proposed declarative design (#2). Having said that, I agree it's a good idea.

  2. I propose to follow matplotlib purposefully, especially so that we can

    offload handling these parameters to existing matplotlib functions as well

    That means we don't have to make the type checks ourselves.

  3. Whether or not the set_* methods are called after plot creation, having getters and setters is generally frowned upon in python. That's what properties are for. Not trying to be confrontational, just making an observation.

leotrs avatar Aug 17 '16 13:08 leotrs

great thoughts, guys!

on plt.scatter @jonchar: I think using plt.scatter is a cool idea too. In other words, all we have to do is compute the (x, y) coordinates for each point, and pass in the xs and ys, rather than iterate and add patches individually. That will definitely make the underlying code cleaner, and will allow us to separate the logic that draws to screen from the logic that computes x-y coordinates.

on set_* methods I'm conflicted on whether or not to remove them explicitly, now that I've heard arguments both ways. On one hand, right now, we are using matplotlib as the backend. But a declarative API will mean that we should be able to swap in bokeh, plotly etc. Let's see what your PR looks like, @leotrs.

on type checking As for type checking... it's the same argument. Right now we're using matplotlib, so it makes sense to have matplotlib do the type checking. I added those assertion statements in the hopes that the error messages would be more informative for the end-user than a generic matplotlib one would be. I guess at this point it's a bit more of a front-end design choice - are we designing for users familiar with matplotlib at this point?

ericmjl avatar Aug 17 '16 13:08 ericmjl

@ericmjl: I can put together a PR for using plt.scatter to see what that would look like.

@leotrs: You're right, properties are the way to go. The existing set_* methods are leftover from the initial implementation.

jonchar avatar Aug 17 '16 14:08 jonchar

@ericmjl Just wanted to check-in. I still plan on implementing this, hopefully soon. As I told you, I just moved to Boston and started grad school, and I'm still adjusting. But a couple of hours should clear up soon-ish, and I'll get back to work on this!

leotrs avatar Sep 23 '16 00:09 leotrs

That'd be awesome, @leotrs! Hope you're doing well right now at NEU. Let's see how both of our next set of updates shape up and see how we can integrate them!

ericmjl avatar Sep 23 '16 00:09 ericmjl

@jonchar @leotrs FYI I merged some API changes into master. Not the best practice right now, as I should have done a dev branch first, but it'll do for now. In case you guys are still working on PRs, don't forget to pull in the latest updates!

ericmjl avatar Jan 05 '17 20:01 ericmjl