igraph icon indicating copy to clipboard operation
igraph copied to clipboard

Make sure that passing in NaN weights do not cause a crash

Open szhorvat opened this issue 5 years ago • 19 comments

Passing NaNs as weights will cause some functions to crash, see e.g. https://github.com/igraph/rigraph/issues/415

Functions which can't handle NaNs should check the weight vector for NaNs, and report an error.

Note that some functions may only access a small part of the weight vector in some cases. Ideally, they should not loop over the entire weight vector just to check for NaNs then.


This issue probably affects the R interface the most, as NAs, which are represented as a kind of NaN, are common in R. A stray NA in a weight vector in R may cause a crash. An alternative resolution is to check for NaNs / NAs in the high level interface before passing the values to the C core.


List of functions that are likely affected (please amend this list as necessary):

  • [x] several shortest path functions
  • [x] betweenness
  • [x] closeness
  • [x] random walks
  • [ ] igraph_constraint (does not check for NaN, should it?)
  • [x] igraph_diversity
  • [ ] eigenvector-based functions
    • [ ] igraph_eigenvector_centrality (doesn't check for NaN)
    • [ ] igraph_hub_and_authority_scores (doesn't check for NaN)
    • [ ] ...
  • [x] PageRank
    • [x] ARPACK
    • [x] PRPACK
  • [ ] flow-based functions
    • [ ] igraph_maxflow (does not crash on NaN)
    • [ ] igraph_maxflow_value
    • [ ] igraph_st_mincut (does not crash on NaN)
    • [ ] igraph_st_mincut_value
    • [ ] igraph_mincut_value
    • [ ] igraph_mincut
    • [ ] igraph_even_tarjan_reduction
    • [ ] igraph_residual_graph
    • [ ] igraph_reverse_residual_graph
    • [ ] igraph_all_st_mincuts
    • [ ] igraph_gomory_hu_tree
  • [ ] community detection
    • [ ] igraph_community_leiden (does not check NaN, unclear if it needs to)
    • [x] igraph_community_fastgreedy
    • [x] igraph_community_infomap
    • [x] igraph_community_label_propagation
    • [ ] igraph_community_leading_eigenvector
    • [x] igraph_community_multilevel
    • [x] igraph_community_optimal_modularity
    • [ ] igraph_community_spinglass
    • [x] igraph_community_walktrap
  • [ ] layout functions — only the below ones use weights
    • [ ] igraph_layout_fruchterman_reingold(_3d) (does not check NaN, unclear if it needs to)
    • [ ] igraph_layout_kamada_kawai(_3d) (does not check NaN, unclear if it needs to)
    • [ ] igraph_layout_drl(_3d) (does not check NaN, unclear if it needs to)
    • [x] igraph_layout_umap(_3d)
  • [ ] Shortest paths (paths)
    • [x] Dijkstra
    • [x] Dijkstra (all shortest paths)
    • [x] Bellman-Ford (does not currently check for NaN)
    • [x] A*
  • [x] Distances (i.e. path lengths)
    • [x] Dijkstra
    • [x] Bellman-Ford
    • [x] Johnson
    • [x] Floyd-Warshall
  • [x] Widest paths (paths)
  • [x] Widest paths (path widths)
  • [x] Other shortest path related
    • [x] igraph_average_path_length_dijkstra
    • [x] igraph_local_efficiency
    • [x] igraph_global_efficiency
    • [x] igraph_diameter_dijkstra
    • [x] igraph_get_k_shortest_paths (though calling igraph_get_shortest_path_dijkstra)
  • [x] minimum spanning tree
  • [x] feedback arc set (does not check for NaN, but needs to; consider if +-Inf is a problem)

szhorvat avatar Dec 01 '20 19:12 szhorvat

This issue is a bit vague in the sense that it's hard to tell whether the goal outlined in the title has been achieved or not (cf. the halting problem). Maybe we should come up with an explicit list of functions to check/test (the ones that we think might be affected by this problem) so we know that the issue can be closed when we have gone through that list and confirmed that all of these functions work correctly.

ntamas avatar Dec 17 '20 09:12 ntamas

Yes, the intention was to put a list with checkboxes in the top post. Part of the task is to identify affected functions.

szhorvat avatar Dec 17 '20 09:12 szhorvat

Shortest path functions in principle work OK with NaN values. That is, they do not crash. However, they are not consistent in their output and handling of NAN. I use a simple graph (0 - 1 - 2 - 3 - 0) where the last edge has weight NAN and the others have weight 1.

The results are as follows:

igraph_shortest_paths_bellman_ford
0 1 2 3
1 0 1 2
2 1 0 1
3 2 1 0

It is debatable whether this answer is correct. If the edge 2-3 would also be NAN igraph_shortest_paths_bellman_ford would return infinity, and says it could not be reached.

igraph_shortest_paths_dijkstra
0 1 NAN NAN
1 0 1 2
NAN 1 0 1
NAN 2 1 0
igraph_shortest_paths_johnson
0 1 NAN NAN
1 0 1 2
NAN 1 0 1
NAN 2 1 0

These two answers are correct, for the NAN entries here we cannot determine whether the alternative path would be shorter or not. For the provided distances, the alternative path is known to be longer. I would propose that we also need to output NAN in the indicated locations for igraph_shortest_paths_bellman_ford.

When getting (all) shortest paths to all other vertices from 0 the results are as follows.

igraph_get_shortest_paths_dijkstra
0
0 1
0 3 2
0 3
igraph_get_all_shortest_paths_dijkstra
0
0 1
0 3 2
0 3

Again, this is wrong, the shortest path to node 2 and 3 are unknown (as correctly indicated by the NAN in igraph_shortest_paths_dijkstra and igraph_shortest_paths_johnson). I think this should be corrected.

This issue is only for tracking whether a function crashes. Should we open a separate issue for ensuring that results are in fact correct when passing in NaN weights?

vtraag avatar Dec 18 '20 23:12 vtraag

OK, I now identified some problems. igraph_vector_min (and presumably some other functions) will return nan. Hence, checks on positivity for example do not work correctly when nan's are involved.

There is then a question on what counts as valid input. Should we allow positive weights with occasional nan? Or should we not allow nan at all? As you can see from the results above, the algorithms in principle work "correctly" (i.e. they do not crash) with nan. Perhaps there are some use cases where it is convenient to work with nan for missing data, and that you can still calculate some shortest paths (if they are correct). From this point of view, it might make most sense to allow nan values.

vtraag avatar Jan 12 '21 13:01 vtraag

Please don't make changes to vector_min or vector_min's usages until the harmonic centrality PR is merged.

szhorvat avatar Jan 12 '21 13:01 szhorvat

No, I think that making changes to igraph_vector_min would not be the right direction. There are some other ways of checking the negative entries though, perhaps doing it manually. I am not sure yet of the best approach. For now the question is, do we want to allow nan values or not?

vtraag avatar Jan 12 '21 13:01 vtraag

@vtraag Please update the checklist above now that #1617 is merged. Thanks!

ntamas avatar Jan 21 '21 12:01 ntamas

Note to self: some other vector functions should also be checked still. Most notably igraph_vector_(which)_minmax, but perhaps also some others that should work correctly for NaN input still.

vtraag avatar Jan 23 '21 17:01 vtraag

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 26 '21 10:03 stale[bot]

@vtraag Would you have time to make the list in the original post granular (one box for each function) to have a clear picture of what's missing? Many more cases have been fixed since this issue was open (I fixed several), but unfortunately we did not keep track.

szhorvat avatar Mar 31 '21 09:03 szhorvat

Yes, the intention was to put a list with checkboxes in the top post. Part of the task is to identify affected functions.

quoting the reply, comment

EugeniyaP avatar Jan 30 '23 20:01 EugeniyaP

@EugeniyaP Was this comment made in error, or did you mean to ask something about this issue?

szhorvat avatar Jan 30 '23 20:01 szhorvat

hi @szhorvat , I am exploring the possibilites of commenting on Github, there are three possibilities over all:

  1. leave a comment in the end of the thread
  2. leave the comment quoting someone's reply
  3. leave a comment mentioning to whom you're addressing it - using @ symbol

There is no possibility to create a direct link to the comment in the thread, correct?

EugeniyaP avatar Jan 30 '23 20:01 EugeniyaP

There is no possibility to create a direct link to the comment in the thread, correct?

Yes, there is. Use the ... menu at the top-right of any comment. Example: https://github.com/igraph/igraph/issues/1560#issuecomment-1409264666

there are three possibilities over all:

You can include anything you like in a comment, using Markdown notation. Here are the docs on that: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax Quotation is not a special feature, it's just one of the markups possible in Markdown.


Let's not overload existing issues with unrelated discussion though. If you'd like to experiment, you can create a repository for this purpose under your own account. You will have full control over it and you will be able to delete comments, or even the entire repository, when you're done.

szhorvat avatar Jan 30 '23 20:01 szhorvat

thank you, @szhorvat ! can I message you directly anywhere?

EugeniyaP avatar Jan 30 '23 20:01 EugeniyaP

Yes, if you use @ followed by my username, I will get a notification (unless I turn that off explicitly). If you need help, feel free to try this and ping me from the sandbox repository you create for the purpose of experimentation.

szhorvat avatar Jan 30 '23 20:01 szhorvat

thank you, @szhorvat!

EugeniyaP avatar Jan 30 '23 20:01 EugeniyaP

This issue is only for tracking whether a function crashes. Should we open a separate issue for ensuring that results are in fact correct when passing in NaN weights?

IMO it's easier to track everything together.

I made the list at the top a bit more granular, but it's still not complete.

szhorvat avatar Feb 05 '23 16:02 szhorvat

@vtraag igraph_community_leiden() does not check for NaN, should it?

szhorvat avatar Apr 17 '24 17:04 szhorvat