Graphs.jl icon indicating copy to clipboard operation
Graphs.jl copied to clipboard

Typical graphs to test on?

Open gdalle opened this issue 1 year ago • 3 comments

Are there any standard sets of benchmark graphs on the web that we could use for testing? We would want them to be

  • diverse: cover several structures and densities
  • scalable: go from small to large
  • easy to generate or download

gdalle avatar Nov 21 '24 11:11 gdalle

I wanted to write some guidelines on benchmarking for a very long time but have not gotten around it. But let me write down some of my unstructured thoughts here:

  • Graphs for benchmarking need to depend on the problem:
    • The size should be adapted to the problem. Some NP-hard problem should probably only run a graph with 100 vertices while the typical traversal based algorithms should run on graphs on the order of 10^5 to 10^7 vertices.
    • The structure needs to be adapted to the problem. For example it does not make sense to run an algorithm that computes the transitive closure on a strongly connected graph.
    • Parameters that might depend on the problem are
      • Density (i.e.number of edges)
      • Degree and diameter for traversal based algorithms
      • The number of strong/week components
      • Regularity and symmetry for problems related to isomorphism
    • Random graphs, especially the Erdos-Renyi graph can be used for benchmarking but should be the only ones used as they usually have very concentrated parameters as the number of parameters increase.
    • Good graphs often come from geometry or social networks.
    • Some graphs ideal for benchmarking can be found in https://github.com/JuliaGraphs/SNAPDatasets.jl.
    • Small graphs with some specific properties can be found at https://houseofgraphs.org/. One can also use the program called geng that is included here to generate them. Also someone created an online version
  • If someone runs benchmarks for PR they should put the code as comment into that PR so that reviewer can get an idea if the benchmark makes sense.
  • It is also important to specify the environment used to run that benchmark:
    • Which -O flag is used to run to start Julia. Default is -O(2) but we might wanna run benchmarks with-O(3).
    • For parallel problems the number of physical/logical cores and how many Julia is allowed to use.
    • If you run the benchmark on a very weak laptop, please also specify how much RAM you have.
  • The key to good benchmarks is to reduce the amount of noise (variance).
    • Use BenchmarkTools.jl or Chairmarks.jl to run the problem multiple times.
    • Report not only average but also minimum/mean/max time - especially minimum and mean tend to have much less variance than average and max.
    • If your problem makes use of caching we might need a method to either clear the cache manually or we should report the time of the first and second run. But we need to be sure that we don't measure the compilation time here.

simonschoelly avatar Nov 24 '24 17:11 simonschoelly

Some here? https://github.com/GunnarFarneback/FeedbackArcSets.jl/pull/2

jd-foster avatar Nov 26 '24 07:11 jd-foster

I was thinking of mentoring a GSoC project on graph storage, downloads and benchmark sets. How would that sound?

gdalle avatar Nov 26 '24 08:11 gdalle