fix: Support character labels in `assortativity_nominal()`
Fix for assortativity_nominal with character types
- [x] Understand the issue:
assortativity_nominalfails with character labels (like 'A', 'B') becauseas.numeric()produces NAs - [x] Examine the current implementation in R/aaa-auto.R
- [x] Create test cases for character labels (letters and string numbers)
- [x] Implement fix: Use conditional
as.integer(as.factor(types))only for character vectors - [x] Run tests to validate the fix
- [x] Format code with air
- [x] Run full test suite (6959 tests pass)
- [x] Update documentation
- [x] Address review feedback: Only convert character vectors for clarity
- [x] Run security checks
Summary
Fixed bug where assortativity_nominal() would fail with character labels. The fix conditionally uses as.integer(as.factor(types)) to convert character vectors to consecutive integers starting from 1, which is what the C implementation expects.
Changes:
- Modified
assortativity_nominal()inR/assortativity.Rto conditionally convert only character types usingas.integer(as.factor()) - Updated documentation to clarify character vector handling
- Added comprehensive test cases
Benefits:
- Fixes character type bug (the reported issue)
- Avoids unnecessary conversion overhead for numeric inputs
- Maintains backward compatibility
- All tests pass
Original prompt
This section details on the original issue you should resolve
<issue_title>bug in assortativity_nominal</issue_title> <issue_description>Describe the bug When trying to compute assortativity for categorical labels attached to nodes, an error is returned when the labels are letters but not when the labels are numbers. It is mentioned that the bug is unexpected and to report it with a reproducible example.
To reproduce
The script below reproduces the bug
library(igraph)
set.seed(2)
g <- sample_gnm(10, 20)
V(g)$random1 <- sample(c(1, 2), 10, replace = T)
V(g)$random2 <- sample(c('1', '2'), 10, replace = T)
V(g)$random3 <- sample(c('A', 'B'), 10, replace = T)
# compute the assortativity of this node attribute
assortativity_nominal(g, types = V(g)$random1) # this is OK
assortativity_nominal(g, types = V(g)$random2) # this is OK
assortativity_nominal(g, types = V(g)$random3) # this leads to the output below
Error in assortativity_nominal(g, types = V(g)$random3) : At core/core/vector.pmt:126 : Assertion failed: size >= 0. This is an unexpected igraph error; please report this as a bug, along with the steps to reproduce it. Please restart your R session to avoid crashes or other surprising behavior. In addition: Warning message: In assortativity_nominal(g, types = V(g)$random3) : NAs introduced by coercion
Version information Which version of igraph are you using and where did you obtain it?
igraph_1.6.0 from CRAN
R version 4.3.3 (2024-02-29) Platform: x86_64-pc-linux-gnu (64-bit) </issue_description>
Comments on the Issue (you are @copilot in this section)
Please always test with the latest version before reporting issues. I cannot reproduce it with 2.0.2. Can you try this version?</comment_new>
<comment_new>
I agree that the situation is not ideal. I must note that I am not an R user or R programmer, so I can't judge very well what is reasonable in R. Neither do I make the decision about whether we will do anything about this. But here's a suggestion for an improvement.
Let me know what you think @krlmlr
First, notice that the error message is not very good. It talks about negative indices, as in C we index the categories from 0. In R we index from 1. There's thus the usual problem about how to phrase the error to fit both. https://github.com/igraph/igraph/issues/2119
The types argument here represents categorical data. It would indeed be very nice if other representations than indices could be supported, for example string names. Categorical data appears in many places in igraph as an input argument, such as:
- Community / cluster / component membership as passed to function that take groups, such as: modularity calculation; vertex contraction
- Vertex and edge colours in isomorphism functions
- Categorical assortativity
... and possibly others I'm not thinking of now.
Some high-level languages support categorical data directly. Isn't this what factor is for in R? Mathematica does not have a data type for this, but I do have functions to convert other representations to category indices, and I allow categories to be specified in flexible ways.
Categories also have different representations, each being most useful in specific contexts: we can assign a category name to each object/vertex: vertex 1 is "blue", vertex 2 is "red", vertex 3 is "blue"; or we can list the category members: "blue" contains {1, 3}, "red" contains {2}.
Should we then have a special Stimulus type specifically for categorical data? This would make it easy to auto-generate code that can handle various kinds of category representations that are convenient in the host language, and convert each to simple 0-based membership vectors that can be sent to C. The raw C errors we see here would never appear: error checking would be done by the function that converts the category representations. Users could work much more conveniently with such data.
Opinions, @krlmlr and @ntamas ?
Potentially related:
- https://github.com/igraph/igraph/issues/2465
- igraph/rigraph#193
- http://szhorvat.net/mathematica/IGDocumentation/#igpartitionstomembership-and-igmembershiptopartitions</comment_new>
<comment_new>
@ntamas
Should we then have a special Stimulus type specifically for categorical data?
Yes, IMO it would be a good idea. Currently we have VERTEX_COLORS and EDGE_COLORS (probably only in the develop branch?). I think it's an ill-suited name but semantically it means the same thing, isn't it?</comment_new>
<comment_new>
types <- as.integer(as.factor(types)) in `assortativity_nom...
- Fixes igraph/rigraph#1283
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.
@copilot: Only convert character vectors to integers, for clarity.
@copilot: Only convert character vectors to integers, for clarity.
Done in 175cfd1. Now only character vectors are converted using as.integer(as.factor()), numeric inputs pass through unchanged.