flow_stability
flow_stability copied to clipboard
Facilitate analysis of a sequence of temporal networks
The user interface should extend the one implemented in #53
We want to be able to process large temporal networks, in particular networks that extend over a considerable amount of time (relative to the average interaction duration). One way to deal with such networks is to split them up into a sequence of shorter networks and perform a flow stability analysis on each temporal network in the sequence.
Approach
The interface should be structured similarly to the FlowStability class (see #53). We can define a FlowStabilitySequence class that inherits from FlowStability. FlowStability, in turn, can inherit from base class (one that does only require arguments specific to each element in the sequence and for a simple analysis) and the sequence in a FlowStabilitySequence instance can then contain a list of base class instances.
Clarify
- What additional parameters are required to specify a
FlowStabilitySequence? - How should be address the processing of each element in the sequence? Do we want to include a parallelization step with multiprocessing and/or adhere to an architecture that is well suited for HPC clusters?
Looking at the current scripts, I would say that these are the "meta-parameters":
from run_laplacians_transmats.py:
optional.add_argument("--ncpu", default=4, type=int,
help="Size of the multiprocessing pool.")
optional.add_argument("--num_slices", default=50, type=int,
help="number of slices that will be used to parallelize and save the results")
optional.add_argument("--slice_length", default=None, type=float,
help="Length of a single slice. Used to set the number of slices for parallelization instead of num_slices. If provided, will have priority over num_slices.")
optional.add_argument("--t0", default=None, type=float,
help="time when to start the analysis. Default is the starting time of the first event.")
optional.add_argument("--tend", default=None, type=float,
help="time when to stop the analysis. Default is the ending time of the last event.")
optional.add_argument("--verbose", action="store_true")
optional.add_argument("--batch_num", default=0, type=int,
help="if the work is splitted in several batches (to split over several computers), batch numer")
optional.add_argument("--total_num_batches", default=1, type=int)
optional.add_argument("--time_slices_from_net_file", action="store_true",
help="Uses the time slices saved with the TemporalNetwork file, in `net.time_slices_bounds`.")
optional.add_argument("--intervals_to_skip", default=[], type=int, nargs="+",
help="list of intervals to skip. given as '(int1 int2 ...)'")
from run_cov_integrals.py:
optional.add_argument("--num_points", default=50, type=int,
help="Number of steps of the grid overwhich the integral results will be saved.")
optional.add_argument("--int_length",default=None, type=int,
help="Length of a single grid interval. Used to set the number of intervals instead of num_points")
optional.add_argument("--int_list", default=[], type=int, nargs="+",
help="List of intervals used for the integral. Used instead of num_points or int_length.")
optional.add_argument("--time_direction", default="both", type=str,
help="can be 'forward','backward' or 'both'. Default is 'both'.")
optional.add_argument("--only_from_start_and_finish", action="store_true",
help="instead of computing every combinations of start and finish, will compute every integrals forward from start and backward from finish.")
optional.add_argument("--only_from_start", action="store_true",
help="instead of computing every combinations of start and finish, will compute every integrals forward from start.")
optional.add_argument("--only_from_finish", action="store_true",
help="instead of computing every combinations of start and finish, will compute every integrals backward from finish.")
optional.add_argument("--only_one_interval", action="store_true",
help="instead of computing every combinations of start and finish, will compute from every start but only for one interval.")
optional.add_argument("--print_mem_usage", action="store_true",
help="print memory usage.")
optional.add_argument("--print_interval", default=100, type=int,
help="Controls how often memory usage is printed.")
from run_clusterings.py:
optional.add_argument("--nproc_files", default=4, type=int,
help="Number of processes over which to split files to work on.")
optional.add_argument("--nproc_clustering", default=1, type=int,
help="Number of processes over which to split clustering iterations.")
optional.add_argument("--init_p1", action="store_true",
help="For non-homogeneous initial distribution, must be used with --direction.")
optional.add_argument("--direction", default="forward",
help="'forward' or 'backward', used with --init_p1.")
However, the scripts offer a lot of flexibility, probably too much. So, we may remove some functionality. For example, force --only_from_start_and_finish.
I think the use case will be more multiprocessing on a machine with many cpus rather than using HPC clusters. So, let's focus on this for the moment.
I think the use case will be more multiprocessing on a machine with many cpus rather than using HPC clusters. So, let's focus on this for the moment.
Makes sense. This covers a broader range of use cases in my opinion. Also with a multi-cpu enabled code we can still benefit from an HPC cluster for embarrassingly parallel split ups, configuring single jobs to run on multiple cpu's.