ctv icon indicating copy to clipboard operation
ctv copied to clipboard

CRAN Task View: NetworkAnalysis

Open FATelarico opened this issue 1 year ago • 33 comments

Scope

The proposed CRAN Task View contains a list of packages that can be used for dealing with networks (also known as relational data and graphs).

Packages

Core packages include:

intergraph igraph statnet sna network

The other packages:

graph BoolNet egor ionet networkDynamic tidygraph centiserve birankr goldfish amen ergm ergm.count ergm.ego ergm.multi ergm.rank ergmgp ergmito biergm dnr bootnet localboot dyads fastnet multinets nda baycn BayesianNetwork implements bgms bnma econetwork AnimalHabitatNetwork aniSNA assocInd ATNr BIEN bipartite cassandRa bibliometrix bibliometrixData biblionetwork Diderot c3net Ac3net ahnr BASiNET bionetdata Cascade evolqg NetworkToolbox qgraph HospitalNetwork geonetwork chessboard epanet2toolkit intensitynet epinet hybridModels netdiffuseR FinNet ITNr modnets multinet visNetwork networkD3 bipartiteD3 diagram ndtv neatmaps ggnetwork ggraph ggsom graphlayouts cencrne linkcomm concoR blockmodeling BlockmodelingGUI kmBlock dBlockmodeling signnet blockmodels sbm dynsbm MLVSBM StochBlock GREMLINS

Overlap

The only TaskView that could overlap with thematic ones (e.g., epanet2toolkit is also in the hidrology TaskView), but this is ineherent ot a method-oriented CTV as sopposed to a substantive one.

In general, there does not appear to be substantial overlap with existing CRAN task views.

Maintainers

  • Main maintainer: Fabio Ashtar Telarico (@FATelarico)
  • Co-maintainers: Carl Nordlund, Saint-Clair Chabert-Liddell

FATelarico avatar Apr 14 '24 17:04 FATelarico

Thanks for the proposal Fabio @FATelarico & Co! I appreciate that you have compiled this list of packages along with the corresponding description. This is a useful overview of a collection of packages on block modeling. However, I think that this is too specialized and not substantial enough for a standalone task view.

I'm also cc'ing Bettina @bettinagruen as the principal maintainer of the "Cluster" task view to see what she thinks and whether this could become a section in the "Cluster" task view.

However, overall my feeling is that this would better fit a "Network Analysis" task view which we currently do not have. Also I'm cc'ing Søren @hojsgaard in case he has any thoughts or recommendations.

zeileis avatar Apr 14 '24 21:04 zeileis

I agree that this would be best suited for a Network Analysis task view.

The Cluster task view contains already a number of sections. One could thus rather easily add a section on block modeling. However, the description of the packages would need to be more detailed covering each package separately to be in line with how other packages are described in the task view. Presumably none of the packages would then also qualify as core for this more general task view.

bettinagruen avatar Apr 15 '24 06:04 bettinagruen

Following on your suggestions the porposed task view was updated to address the entire field of network analysis

FATelarico avatar Apr 15 '24 15:04 FATelarico

Thank you @FATelarico ! I checked your new proposal and to me, the core packages are well identified and relevant. However,

  • The two sections that describe the different functions of some of the packages is not (because task view are mainly made to describe the packages and not their features / functions in such a precise way: the user guide is made for this).
  • Overall, the current proposal is organized by purpose with many redundancy among packages in the different topics (because you describe the functions more than the packages): this would also be good to limite this redundancy.
  • Sometimes you forgot to use the macro r pkg( to cite a package and sometimes the references do not seem to be cited in the Reference section.
  • I think that the package blockmodels could be added to the Block model section.
  • Similarly to Bayesian network inference, other packages perform network inference with other type of models, like huge or glasso (among many) for Gaussian Graphical Models and GENIE3 (Bioconductor) for inference with RF.
  • Some regression methods also use networks as input and could be added to the TV as well (e.g., genlasso).

tuxette avatar Apr 15 '24 17:04 tuxette

Fabio, thanks for all the work and the quick revision! This is a useful start for a network analysis task view. In addition to Nathalie's comments a few additional thoughts:

  • The co-maintainers are still the ones from your blockmodeling proposal but it would be good to bring in a couple of persons with more expertise in statnet/sna/ergm.
  • The inclusion/exclusion criteria should be worked out better.
  • The overlap with "Graphical Models" (maintained by Søren @hojsgaard) needs to be addresses in the inclusion/exclusion criteria. Especially with regard to graph/network infrastructure (basic computations, manipulations, visualizations) and with regard to Bayesian networks.

zeileis avatar Apr 15 '24 20:04 zeileis

Thanks to the editors for their comments.

@tuxette

  • I am aware that usually functions are not described in task views and those sections have now been removed. I had originally thought they could be useful because many people who begin doing network analysis are often perplexed about whether to use igraph or statnet/sna and only realise they picked up the wrong package for their needs after they are already shoulders deep in it.
  • All mentioned packages should be tagged with the correct macro now (the problems were mostly in the last section Clustering-Others);
  • After double-checking, ] blockmodels seems to be already included;
  • As the inclusion/exclusion criteria was refined (see below), this set of packages became less relevant as they would better fit in the GraphicalModels CTV;
  • Similarly, packages offering methods to run regression over graphs representing variables may rather belong to the GraphicalModels CTV than here.

@zeileis

  • Emails were sent out to other possible co-maintainers from the ergm development team
  • The inclusion/exclusion criterion was refined
  • The distinction between network analysis and graphical modeling was introduced, references to the that CTV are provided in the section on Network Modeling as well as in the introduction.

FATelarico avatar Apr 16 '24 12:04 FATelarico

Thanks! I liked the new version. However, I am not convinced by your section on "Bio-Chemical Networks": In your answer above, you explain that network inference is out-of-the-scope of your TV (and I could agree with that) but you are citing three packages for network inference that are far from being the most known and used (citing c3net and not WGCNA, among others, seem to a highly biased choice). Also, I don't get the relation between the TV topic and evolqg.

A few additional minor remarks:

  • In Section "Bio-Chemical Networks", you have an empty bullet point line.
  • The titles of your sections are not all capitalized similarly.
  • In Section "Bio-Chemical Networks", the reference "Simoes and Emmert-Streib" is not formatted properly.
  • You have a typo in the title with the word "Psychology" in it.
  • In Section "Social and Economic networks", sna is not cited with the proper macro.
  • I think that "Extension for ggplot2" should be "Extensions for ggplot2".
  • The package https://cran.r-project.org/web/packages/greed/index.html could also be worth citing.

tuxette avatar Apr 18 '24 06:04 tuxette

I agree that there is good progress here. However, I feel that the inclusion/exclusion criteria are not as clear yet as they should be. Getting contributions/feedback from someone with more ergm expertise would probably be good. And the separation with graphical models has also some room for improvement.

Hence, I'm pinging Søren @hojsgaard again: Could you please have a look at the proposal?

And I suggest that we wait until you have feedback/suggestions from two more potential co-maintainers who can increase the diversity among the maintainer team.

zeileis avatar Apr 18 '24 06:04 zeileis

Pavel Krivitsky from Statnet here. Thank you, @FATelarico, for inviting me. I'll read through the discussion and the draft in detail later, but I want to flag a few items as a matter of first impression.

  1. statnet is a metapackage: all it does is pull in the most popular packages from the project. The actual functionality is in its reverse-dependencies. It's been a while since I've looked at what's in igraph, but loosely, igraphnetwork + sna.
  2. There is a number of dynamic network packages that aren't listed (tergm, relevent, btergm, tsna, just off the top of my head). We may want a dynamic network section.
  3. There is the EpiModel suite of packages that builds on Statnet's for epidemic modelling.
  4. It may make sense to split packages based on the kinds of questions they answer. E.g., clustering tells you which nodes belong to each group, whereas ERGMs tell you about the "big picture" social forces.

krivit avatar Apr 23 '24 04:04 krivit

Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise.

For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573.

Some quick feedback regarding the points you raised:

  1. For the task view it would be good to list both statnet and the constituting packages and briefly explain what they do. Similarly, both igraph and network + sna should be listed and explained. The main purpose of task views is to provide an overview - and not to endorse/recommend the best packages for a given task.
  2. Sounds like a good idea to me.
  3. EpiModel is listed in the Epidemiology task view. So for the topic of "disease networks" etc. I would simply link to that task view.
  4. Sounds like a good idea to me.

Thanks & best wishes!

zeileis avatar Apr 24 '24 00:04 zeileis

Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise.

@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?

For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573.

Thanks!

Some quick feedback regarding the points you raised:

1. For the task view it would be good to list both `statnet` and the constituting packages and briefly explain what they do. Similarly, both `igraph` and `network` + `sna` should be listed and explained. The main purpose of task views is to provide an overview - and not to endorse/recommend the best packages for a given task.

This is more about functionality rather than endorsement. The short of it is that network contains tools for managing the data structure, and sna contains EDA tools for networks, which can use both network objects and edgelists, as well as some inferential tools (e.g., QAP and MRQAP). igraph, from what I understand, contains both the data structure management tools and the EDA tools.

3. `EpiModel` is listed in the [Epidemiology](https://CRAN.R-project.org/view=Epidemiology) task view. So for the topic of "disease networks" etc. I would simply link to that task view.

I don't think there is any harm in doing both.

4. Sounds like a good idea to me.

A good phrasing might be challenging to come up with, but I suppose we can play around with it and see what happens.

krivit avatar Apr 24 '24 01:04 krivit

Re: EpiModel. We try to avoid overlap, if possible, in order to keep the task views more focused and more manageable (both for readers and for maintainers).

In this case, my feeling is that the scope of the package belongs rather clearly to "Epidemiology" and thus I would avoid the duplication. Feel free to iterate, if I'm missing something here (e.g., if EpiModel contains algorithms that will often be used in other network analyses, beyond infectious disease modeling).

If you feel that the "Epidemiology" task view should have a dedicated section on disease networks, I would encourage you to raise this with the Epidemiology task view maintainers.

zeileis avatar Apr 24 '24 08:04 zeileis

@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?

@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.

This is more about functionality rather than endorsement. The short of it is that network contains tools for managing the data structure, and sna contains EDA tools for networks, which can use both network objects and edgelists, as well as some inferential tools (e.g., QAP and MRQAP). igraph, from what I understand, contains both the data structure management tools and the EDA tools.

I agree. As mentioned in the currnt draft, igraph is more of a one-stop shop for data-managing tasks, basic modeling, and clustering. It provides more or less the same data-centered features as network plus some of sna's inferential tools. But many people do not actually need most of what sna has to offer and igraph has so many specialised add-ons/reverse-dependencies that many people prefer/have to use that. Any suggestion on how to elucidate this point further in the text will be welcome!

FATelarico avatar Apr 26 '24 17:04 FATelarico

A new draft is online, I apologise for the delay.


@tuxette

citing c3net and not WGCNA, among others, seem to a highly biased choice

The choice was quite arbitrary because none of us is directly involved in this field, but several colleagues highlighted these packages as the 'most relevant'. Incidentially, evolqg should not have been included, as pointed out. After some reading in specialised journals, I edited the list of packages in this section. Namely, besides removing a few packages, BioNAR and WGCNA were added.

A few additional minor remarks:

  • [x] In Section "Bio-Chemical Networks", you have an empty bullet point line.
  • [x] The titles of your sections are not all capitalized similarly.
  • [x] In Section "Bio-Chemical Networks", the reference "Simoes and Emmert-Streib" is not formatted properly.
  • [x] You have a typo in the title with the word "Psychology" in it.
  • [x] In Section "Social and Economic networks", sna is not cited with the proper macro.
  • [x] I think that "Extension for ggplot2" should be "Extensions for ggplot2".
  • [x] The package https://cran.r-project.org/web/packages/greed/index.html could also be worth citing.

@krivit

There is a number of dynamic network packages that aren't listed (tergm, relevent, btergm, tsna, just off the top of my head). We may want a dynamic network section.

I started by adding them either under ergm or in the most relevant sections. Feel free to move these and other dynamic-network packages to a separate section if you think there is enough material for it.

There is the EpiModel suite of packages that builds on Statnet's for epidemic modelling.

Taking also into account @zeileis arguments, I added only a brief metion of EpiModel (because it is officially part of statnet) and linked the relevant CTV.

It may make sense to split packages based on the kinds of questions they answer. E.g., clustering tells you which nodes belong to each group, whereas ERGMs tell you about the "big picture" social forces.

The fact that ERGM is about modeling, simulation, and everything in between makes it difficult to slap a label on it or even put it on par with other approaches. But if you feel there is a satisfactory way to do so, the result would be incredibly useful for new users!


Thanks everyone for the feedback and active involvement!

FATelarico avatar Apr 26 '24 18:04 FATelarico

Dear all,

I am not entirely sure I understand the proposal.

Regarding the GraphicalModels task view my approach has been very pragmatic: Package authors contact me to have their package on the task view and I usually add it. If a few packages appear in more than one task view, then I do not see that as a problem.

Another topic: Perhaps it could be an idea to agree on how packages are described on the task views? I generally copy the package description unless it is too lengthy. Maybe there are other practices?

Best Søren

hojsgaard avatar Apr 29 '24 04:04 hojsgaard

Søren, thanks for your feedback. Regarding your comments:

  • Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?
  • Overlap in general: With increasing number of task views and increasing number of packages per task view, it becomes more important that task views have a sharp profile so that it is clear what should go in and what should stay out. While it is not necessary or desirable to avoid overlap completely, we should still try to not have too much overlap. First, less overlap means less duplication of efforts for the maintainers. Second, less overlap but with cross-references between task views means that users will ideally be pointed to one place with useful documentation for them.
  • Package descriptions: I agree that the package title/description is a useful starting point. However, you can probably often improve the description within the task view if you embed it into the context of the appropriate section. In any case, there probably is no "one size fits all" approach here which is why we put this at the maintainers' discretion.

zeileis avatar Apr 29 '24 22:04 zeileis

Apologies for the silence; down with COVID at the moment.

@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.

@FATelarico, I don't think I have push access. I just tried it on a test branch.

Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?

In my experience, the line between graphical models and network analysis that in graphical models (and neural network models, for that matter), the graph is a prespecified component of the model specification that does not depend on the data; whereas in network analysis the graph is the object being observed and summarised or modelled.

krivit avatar Apr 30 '24 03:04 krivit

In response to @krivit:

If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis.

A more general comment: In graphical models (at least traditionally), focus is on some kind of (conditional) independence restriction which is a probabilistic statement. A missing edge represent a conditional independence restriction. That is the classical connection between a graph and a probabilistic model. In larger models with many variables, the graphs become less interesting as visual objects. It is hard to make sense of a graph with 1000 variables :)

So in the distinction between graphical models and network analysis, one view is that it comes down to what is being analyzed? What is the key component in network analysis? Is that conditional independence? Is it another well defined mathematical / statistical concept? Perhaps, I can say it more directly: I am uncertain what network analysis really is...

In response to @zeileis:

You are right that small overlaps between task views are desirable but also that overlaps are unavoidable. Would it be feasible to have a package "belonging" primarily to one specific task view and then one can refer to that from any other task view?

In addition to standardizing the description of packages one thing that perhaps could be nice is to be able to automatically generate an "update history" for each package just to give people an idea about how active a package is maintained.

hojsgaard avatar Apr 30 '24 06:04 hojsgaard

@hojsgaard

If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis.

I am aware of this type of problem, but I didn't want to get too far into the weeds; the main distinction is that the graph is not the object of observation or analysis. I would classify this problem as a model selection problem for graphical models, rather than a network analysis problem.

However, if one then, as you say, tries to understand the properties of this graph, say by visualising it or by detecting groups of variables with similar structural roles in the graph, then it it becomes a network analysis problem. The tools one would use would often be agnostic to whether the graph represents friendships between people or conditional dependence between variables.

I know there are also other intermediate cases. For example, Frank and Strauss (1986) "Markov Graphs" specified a probability model for network structure by constructing a conditional dependence graph (i.e., a graphical model) for edge variables and then using Hammersley-Clifford Theorem to derive the form for the probability of a given graph under the model. This approach and its extensions were then used to infer social forces affecting the structure of the network ever since.

krivit avatar Apr 30 '24 09:04 krivit

@zeileis: Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?

I think the issue in understanding the connection (and difference) between NetworkAnalysis and GraphicalModels is that the former offers tools that are not limited to 'describe statistical models as graphs/networks'. Rather, as @krivit pointed out (rightly, in my humble opinion), network analysis allows to deal with networks representing a/some connection/s between a/some defined set/s of entities. If the entities happen to be variables and the connection between them is conditional dependence (with independence being implied by lack of ties), then you get a graphical model. Obviously, Markovian graphs lie in somewhat of a gray area, but since they are covered in the GraphicalModels CTV, we are not dealing with them.


@krivit : Apologies for the silence; down with COVID at the moment.

Wish you a speedy recovery!

@krivit : I don't think I have push access

It should be fixed now. Let me know

FATelarico avatar May 01 '24 09:05 FATelarico

Thanks for the clarifications @FATelarico @krivit @hojsgaard, I think this is very useful and something to build upon.

I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view.

When the NetworkAnalysis task view is published, its scope description should be adapted correspondingly. Similarly, the first section ("Representation, manipulation and display of graphs") should be streamlined (with yet another cross-reference) once NetworkAnalysis is available.

zeileis avatar May 01 '24 22:05 zeileis

Thank you for the interesting discussion. As @zeileis I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question.

Also, I share the view that GraphicalModels includes packages dealing with graphs as a way to represent some kind of conditional dependency structure between variables (nodes). For me, Markovian graphs and Bayesian networks are more in this task view than in NetworkAnalysis for instance but I agree that the distinction is not easy and clear to make (at least, that is where I would search for information on this topic). However, I am under the impression that a coordination between the two TV is necessary (maybe if you could find someone to be a maintainer of both TV, that would help).

Finally, just a minor comment: WGCNA deals with gene networks (co-expression networks actually), which is not really "biochemistry" (pure biology instead, even though, I agree that, in the end, everything is mostly chemistry, that is not how most people would think of it).

tuxette avatar May 04 '24 08:05 tuxette

@zeileis @tuxette
I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view. I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question.

Thank you for your continued feedback. With the last two edits I improved on the following aspects:

  • https://github.com/FATelarico/ctv-network/commit/8355054cdfc044c0f42cbb3160391d699cb511ab → Corrected the section title for biochemistry.

  • https://github.com/FATelarico/ctv-network/commit/f6311c7ef576b20b02f53c0e9f1782a583f9b7a7 → Editing the inclusion/exclusion to be more explicit regarding the substantive difference between these two area with a cross-reference.

Regarding coordination, I agree that it may be useful. Perhaps @hojsgaard could join our team if he feels okay with it.

FATelarico avatar May 08 '24 14:05 FATelarico

@FATelarico : Thanks! you might also want to correct "Notably, the underlying data mining approach has been used beyond biochemistry."? (I was referring to this sentence actually.)

Regarding the rest, I think that it heads in the right direction. I also think that we should wait until the team of maintainers is completely set.

tuxette avatar May 13 '24 13:05 tuxette

I've drafted a dynamic network modelling subsection, though it does lead to more questions about how to organise things. What do you think?

krivit avatar May 14 '24 10:05 krivit

@krivit : sorry for my very late answer! The current proposal sounds good to me. Do you have any suggestions for additional maintainers who could help cover the diversity of the topic more broadly?

tuxette avatar Aug 16 '24 10:08 tuxette

As a minor note, in your repository, the task view file should be named 'NetworkAnalysis.mdinstead ofctv-NetworkAnalysis.md`.

tuxette avatar Aug 16 '24 10:08 tuxette

Wow, time flies... Since nobody objected, I went ahead and merged it into main.

krivit avatar Aug 21 '24 10:08 krivit

@krivit : sorry for my very late answer! The current proposal sounds good to me. Do you have any suggestions for additional maintainers who could help cover the diversity of the topic more broadly?

Perhaps someone from Stocnet (https://github.com/stocnet/)?

As a minor note, in your repository, the task view file should be named 'NetworkAnalysis.mdinstead ofctv-NetworkAnalysis.md`.

Done.

krivit avatar Aug 21 '24 10:08 krivit

Thanks @krivit ! For me it's all good, including your proposition for another maintainer. We need to wait that other editors react as well and we can proceed to the publication.

tuxette avatar Aug 23 '24 06:08 tuxette