rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

Overview documentation for current functions

Open torfason opened this issue 4 years ago • 9 comments

The igraph documentation is generally quite detailed for each function, once the correct function is found. Over time, as function names have been made more consistent, the large number of available functions (around 800, I believe) has made finding the right one a bit challenging.

It seems that there is an underlying rationale for how the "canonical" name for a given function is constructed. For graph constructors, for example, I have come across three main types, sample_*, make_* and graph_from_* (see a reprex below). However, I have not been able to locate documentation outlining these categories (or categories for other types of functions). I wonder if such documentation exists.

If it does, this feature request would be to make it more prominent through the help or on the web site. If not, could it be created? It would not have to be long, basically a list of prefixes, the role that functions with a given prefix play, and perhaps a list of the functions with that prefix.

    library(igraph, warn.conflicts=FALSE)
    ls("package:igraph") %>% stringr::str_subset("^sample_.|^make_.|^graph_from_.")
    #>  [1] "graph_from_adj_list"          "graph_from_adjacency_matrix" 
    #>  [3] "graph_from_atlas"             "graph_from_data_frame"       
    #>  [5] "graph_from_edgelist"          "graph_from_graphdb"          
    #>  [7] "graph_from_graphnel"          "graph_from_incidence_matrix" 
    #>  [9] "graph_from_isomorphism_class" "graph_from_lcf"              
    #> [11] "graph_from_literal"           "make_bipartite_graph"        
    #> [13] "make_chordal_ring"            "make_clusters"               
    #> [15] "make_de_bruijn_graph"         "make_directed_graph"         
    #> [17] "make_ego_graph"               "make_empty_graph"            
    #> [19] "make_full_bipartite_graph"    "make_full_citation_graph"    
    #> [21] "make_full_graph"              "make_graph"                  
    #> [23] "make_kautz_graph"             "make_lattice"                
    #> [25] "make_line_graph"              "make_ring"                   
    #> [27] "make_star"                    "make_tree"                   
    #> [29] "make_undirected_graph"        "sample_asym_pref"            
    #> [31] "sample_bipartite"             "sample_cit_cit_types"        
    #> [33] "sample_cit_types"             "sample_correlated_gnp"       
    #> [35] "sample_correlated_gnp_pair"   "sample_degseq"               
    #> [37] "sample_dirichlet"             "sample_dot_product"          
    #> [39] "sample_fitness"               "sample_fitness_pl"           
    #> [41] "sample_forestfire"            "sample_gnm"                  
    #> [43] "sample_gnp"                   "sample_grg"                  
    #> [45] "sample_growing"               "sample_hierarchical_sbm"     
    #> [47] "sample_hrg"                   "sample_islands"              
    #> [49] "sample_k_regular"             "sample_last_cit"             
    #> [51] "sample_motifs"                "sample_pa"                   
    #> [53] "sample_pa_age"                "sample_pref"                 
    #> [55] "sample_sbm"                   "sample_seq"                  
    #> [57] "sample_smallworld"            "sample_sphere_surface"       
    #> [59] "sample_sphere_volume"         "sample_traits"               
    #> [61] "sample_traits_callaway"

torfason avatar Sep 17 '20 11:09 torfason

Or course this would be very useful, but we are short of hands, especially when it comes to the R interface of igraph. Contributions would be most welcome!

szhorvat avatar Mar 18 '22 13:03 szhorvat

OK, I'll bite :-)

If I were to help out with a contribution, I could prepare a pull request to the igraph package documentation (https://github.com/igraph/rigraph/blob/dev/R/igraph-package.R), which seems to double as a vignette on the web site (https://igraph.org/r/html/latest/aaa-igraph-package.html).

However, in order for me to do so, I would need some input as to what the actual logic is. As I said in the original question, "It seems that there is an underlying rationale for how the canonical" name for a given function is constructed" but I personally don't fully understand that rationale.

If I could be pointed, even in informal language, to what the naming conventions are based on (for starters the make_, graph_from_ and sample_ and other prefixes if such exist), I could try to expand on that in "documentation-ready" language to prepare a PR.

torfason avatar Mar 18 '22 14:03 torfason

Also, if such an overview were to be created, that would imply that there are "correct and up-to-date" naming conventions for such functions as well as "incorrect or out-of-date" conventions. If that were the case, should out-of-date-named functions be deprecated?

torfason avatar Mar 18 '22 14:03 torfason

I don't really work with R/igraph much (I mainly contribute to the C core and the Mathematica interface), so I can't give a definitely answer. Perhaps @ntamas @vtraag can chime in.

Regarding make_ and sample_, take a look here:

https://igraph.org/r/doc/make_.html https://igraph.org/r/doc/sample_.html

make_xxx() usually comes with xxx() that can be used with make_(). The same goes for sample.

I think the design that shares the xxx() part between make_ and sample_ is not ideal because the meaning may not be the same for the two. For example, in 1.3.0 we will have sample_tree() which samples labelled trees randomly. In contrast, make_tree() creates a k-ary tree. tree() works only with make_, not with sample_. @ntamas, I would suggest deprecating make_tree() soon, and replacing with with make_kary_tree(), as we did in the C core for 0.10. How to transition tree() from make_-compatible to sample_-compatible, I am not sure.

graph_from_ does not have such a separate prefix function. I could tell you that graph_from_ seems to convert the graph from various representations to an igraph object, but this is merely an observation on my part, which you can make as well.

@gaborcsardi Let us know if you have anything to add.

szhorvat avatar Mar 18 '22 14:03 szhorvat

It makes sense to me to have a page that lists what is available for make_, what is available for sample_, and what works with both (if any—someone should check). I am not sufficiently familiar with the general R documentation conventions to be able to tell in what format this information should be conveyed (in the make_ page? separately?). Your input on this would be welcome @torfason

For answers to your other questions, I'd wait for input from others.

szhorvat avatar Mar 18 '22 14:03 szhorvat

Unfortunately I'm also a bit lost regarding naming conventions as I only took over the maintenance of the R-igraph package but I wasn't the one who came up with these conventions. The key to the naming scheme is most likely in R/make.R in the source tree. The idea is that you have make_() for creating graphs in a deterministic way, and sample_() for sampling graphs from some ensemble. Both of these functions take a specification of what to construct and a list of optional modifiers; e.g.:

make_(ring(10), with_vertex_(color = "red", name = LETTERS[1:10])

This one creates a ring graph with 10 vertices (that's the ring(10) part), and it applies the with_vertex_(...) modifier to it, which then adds some vertex attributes. make_ring() and similar functions seem like syntactic sugar for the most common case (single specification, no modifiers) but what actually happens behind the scenes is that make_(...) dispatches the graph creation to these functions and then applies the modifiers to the results.

So, the rule is that the first argument to make_() and sample_() must be a function that returns a constructor specification, and the remaining arguments must be constructor modifiers. Look for constructor_spec() and constructor_modifier() in the source code. For instance, ring(...) is defined as follows:

ring <- function(...) constructor_spec(make_ring, ...)

so basically ring(10) creates a constructor specification that instructs make_ to call make_ring(10). ring(10) does not construct the graph itself; it is the job of make_() to do this. I believe this is meant to support magrittr-style pipes where the graph is constructed only at the end of the pipeline:

pref_matrix <- cbind(c(0.8, 0.1), c(0.1, 0.7))
g <- pref_matrix %>% sbm(n=20, block.sizes=c(10, 10)) %>% sample_

The graph_from_() functions seem to be the odd ones out. For me it seems like they are meant for cases when the structure of the graph is "hardcoded" in some format (edge list, adjacency list etc), unlike most calls to make_() and sample_() where the edges are not stored directly.

ntamas avatar Mar 28 '22 13:03 ntamas

Great overview of the implementation, and important for heavy users. But I wonder if that procedure is relevant to most users? in my opinion, the make_() and sample_() functions feel a lot like magic and when it comes to documentation, they should be best treated as an internal implementation detail. One could call make_ring() syntactic sugar, and I think Linus Torvalds called it plumbing vs. porcelain in the git context).

So for any overview document, I would argue for not covering them at all, simply covering all the end-user variations (call them make_*() to distinguish from the underlying make_()). For one thing, make_ring() has decent findability using autocomplete, whereas ring() as a graph construction function does not.

So, if the emphasis in documentation would be on the "porcelain", then the organizing principle might be that you have "make_() for creating graphs in a deterministic way, and sample_() for sampling graphs from some ensemble" makes sense.

Remaining are the graph_from_*() and I think a way to think of them would be that they are used to construct graphs from data, so you are not really "making" them because they already exist (you are only putting them in igraph format from). Most of the functions seem to follow this convention, although the distinction is not not clean. For example, graph_from_isomorphism_class() seems much more like a make_*() constructor than a graph_from_*() constructor. and make_graph() sees to be a meta-constructor, similar to make_() and sample_().

And this would seem to cover all scenarios in which one constructs an igraph object. So I would then argue for aiming to deprecate all functions not following that convention.

As for where to place such an overview, I think make_() and sample_() are hard to find. The people who need the help are not likely to be messing around with those help pages. Perhaps the package doc, or a vignette.

I also see that some of the constructors are using grouping to link to other constructors in the "see also" section (for example graph_from_edgelist() has the (probably autogenerated) list: "Other determimistic constructors: graph_from_atlas(), graph_from_literal(), make_chordal_ring(), make_empty_graph(), make_full_citation_graph(), make_full_graph(), make_graph(), make_lattice(), make_ring(), make_star(), make_tree()". However, graph_from_data_frame() is not included in this list, and does not include it.

I'll continue to mull this over. Perhaps some updates to the docs here and there could make a difference. And perhaps this is out of scope for igraph – in that for this package it is more important to have access to the most flexible way to use the underlying library than to have a parsimonious structure to the way one goes about it.

torfason avatar Mar 28 '22 15:03 torfason

graph_from_isomorphism_class() seems much more like a make_*() constructor

Personally, I agree, and would even consider renaming this to a make_ function. It is much more like make_from_prufer() (new in 1.3).

Some other dubious ones are graph_from_lcf() and graph_from_atlas(). All of these share the property that there is no way to back-convert most graphs (only a small subset of graphs can be back-converted, even theoretically).

szhorvat avatar Mar 28 '22 16:03 szhorvat

There is also a make_clusters() function which does not even return a graph. Confusing!

szhorvat avatar Apr 04 '22 11:04 szhorvat

@iosonofabio Since you guys are already working on a vignette for R-igraph, this issue might be of potential interest to you as an example of something that could be clarified in a vignette.

ntamas avatar Nov 10 '22 19:11 ntamas

@torfason you might be interested in #638 (and PRs that will follow) whose aim is to add grouping to the pkgdown index reference (website that is not deployed yet). The PR also add a few family tags to the R source which influences local "See also" too.

cc @krlmlr

maelle avatar Jan 23 '23 09:01 maelle

Current deployed reference page https://r.igraph.org/reference/

maelle avatar May 12 '23 09:05 maelle

@ntamas Should this be closed after opening sub-issues?

  • Clarify naming scheme somewhere, probably the contributing guide?

maelle avatar May 15 '23 08:05 maelle

Yes, sounds like a good idea.

ntamas avatar May 15 '23 22:05 ntamas

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

github-actions[bot] avatar May 16 '24 00:05 github-actions[bot]