xgi
xgi copied to clipboard
Rules for default return values
When computing the density of a Hypergraph, there are many possibilities. For example, compute the density of all edges, of edges of a particular order, of edges up to a particular order, or the density of the incidence matrix. The current default is to compute the density of all edges. What should be the default?
During today's all-team call, it was pointed out by @acuschwarze that returning a single number may not be the most natural thing to do since one of the main reasons for researchers to use higher-order structures is precisely that they allow to compute most quantities at different edge orders or sizes. So perhaps the default should be to return a collection of values, say, a list containing the densities at each individual order.
Note that the particular example of density is one that involves more than one function (namely xgi.density
and xgi.incidence_density
). Another example is the degree of a single node which can also be computed for individual edge orders (e.g. by H.nodes.degree(order=d)
), as well as many other NodeStats one could imagine.
Please use this issue to share your general views on this point. Note the discussion is not so much about what should be the default return type (e.g. a list vs an array) as it is about what the default return should be (e.g. one number vs many numbers).
Some questions to stir a debate:
- What would a user expect the default behavior to be? What would be the most useful and/or the least unexpected?
- Should we use the same or different rules for the return values of functions that compute a quantity over an entire network (e.g. density) vs those that compute a quantity for each node (e.g. degree)?
- How does this affect efficiency? (It may be the case that computing a quantity (say density) at one order may help us compute the same quantity at a different order. This means that returning the per-order density can be done more efficiently than simply executing the same density function once per order.)
Note the question "What should the default return value be?" has popped up recently in different parts of the codebase:
- #117 (more about the return type tho)
- #79 (about edge members vs edge ids)
I thought briefly about this.
I think I would have the return type to be "one order", but not having a default order. So the user would be forced to specify, say order=2
. They could also specify order=None
or "all" or whatever to have an aggregate quantity (e.g. for density) over all orders, but it would not be the default.
- why no default orders? I agree with Alice that often enough the aggregate quantity might not be the most intuitive, and be confusing if output by default. I see no good default value for order.
- why no dict with all orders? First for efficiency. Often, the time to compute this will be linear with the number of orders (e.g. degree). I'm fully sure about this one yet, because in some cases a dict will be useful. Having a sister function that outputs a dict clearly wouldn't be optimal. But having an argument in each function to optionally output a dict does not seem to be either? Do we expect people to always want to look at all orders?
I see your point RE "no good default value" and tend to agree.
Perhaps we can make it so that all of the relevant functions accept a single parameter like
-
order=2
returns a single value for the specified order, -
order="agg"
returns a single value that aggregates all orders, -
order="all"
returns a dictionary with one value for each order.
This parameter would be required and have no default value so we force the user to explicitly declare what they want to get.
Something like that seems good yes.
I'm just thinking it might look a bit funny implementation-wise with functions looking like:
def measure(order):
if isinstance(order, int):
# compute single order
elif order=="all":
for order in orders:
# compute single order, and store
where the code to computing the single order could almost be a separate function to avoid redundancy.
This can be easily fixed with a decorator. Though the "agg" option would require some more thinking...