teneto icon indicating copy to clipboard operation
teneto copied to clipboard

Error in tutorial : inconsistent TemporalNetwork.network type?

Open maximelucas opened this issue 4 years ago • 4 comments

Copy pasting the first few lines of the tutorial https://teneto.readthedocs.io/en/latest/tutorial/networkrepresentation.html#temporalnetwork-object yields an error at tnet.network.head() because tnet created with tnet.generatenetwork('rand_binomial',size=(5,3), prob=0.5) is a Numpy ndarray and not a Pandas DataFrame.

I haven't tried other ways of creating tnet, so I'm not sure if tnet.network is consistently a numpy array or not. If so, the tutorial should be updated, otherwise it should be made so that tnet.network is always the same type: either numpy array or pandas DataFrame.

maximelucas avatar Mar 06 '20 18:03 maximelucas

Nice catch! This is just an outdated tutorial. Thanks for flagging this.

The issue here is that modifications were made to only use pandas arrays when the network/connections are sparse (I think if less than 25% of the connections are non-zero). This random network fills that criteria so it makes it a numpy array.

So I am creating a little to do list for myself here:

  • [ ] Update the tutorial.
  • [ ] Add code in tutorial to tests that are performed so this doesn't happen again

wiheto avatar Mar 06 '20 18:03 wiheto

That makes sense! Setting prob=0.2 yields a Pandas DataFrame. Thanks for the explanation.

I'm guessing you made that choice to optimise the code in some way. I'm just thinking, if I write some general code, and then run it on some different datasets, some might be a bit more or a bit less sparse than your threshold. Then, the same code might run or not run, or just yield different results, because of this. Does this not have the potential to create some trouble? Of course I guess I could just check the data and adapt the code every time.

maximelucas avatar Mar 06 '20 20:03 maximelucas

Does this not have the potential to create some trouble?

It is a good point.

  1. All teneto functions are agnostic to whether TemporalNetwork.network is a dataframe or numpy array (and most functions can take HDF dataframes as well (some parts of this are still being optimized though)).

  2. If the user want's to always have dataframe there is an argument when defining TemporalNetwork: forcesparse, which, if True, will make sure the TemporalNetwork.network be a dataframe. I have planned to make this "force_df" and "force_array" but that has been low down on the priority list (or representation=['auto','df','array'])

This behaviour is the an unfortunate trade-off I've had to make for wanting to be quick on small datasets and be able to cope with very large datasets. Trying to find one representation that is (1) familiar for a user to interact with (np or pd), (2) efficient for quick processing, (3) efficient for large networks, is something I hope to implement at one point. But for now, this was the best solution I could think of..

wiheto avatar Mar 06 '20 21:03 wiheto

I sensed it would be something like this, it makes sense, thanks. Your point 1. should make it quite safe. In any case I don't have a better solution either, I jut got surprised because I even prob=0.3 yielded sometimes sparse and sometimes dense nets!

maximelucas avatar Mar 06 '20 21:03 maximelucas