Use `as.numeric(seq_len(n))` in favor of `seq(1, n)`
What happens, and what did you expect instead?
https://github.com/igraph/rigraph/pull/1330#discussion_r1665753951
We also need the by argument to ensure that the output is a numeric.
Perhaps add a function seq_numeric() that has a special case for zero-length output?
To reproduce
seq(by = 1) seems to be the fastest, look at the units. Maybe it's worth arguing with R core to support seq(start, start - by, by) without error and to return an empty vector in this case?
x <- seq(1, 1, 1)
class(x)
#> [1] "numeric"
x
#> [1] 1
seq(1, 0, 1)
#> Error in seq.default(1, 0, 1): wrong sign in 'by' argument
bench::mark(
sum(seq(1, 100000, 1)),
sum(as.numeric(rlang::seq2(1, 100000))),
sum(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 sum(seq(1, 1e+05, 1)) 628µs 791µs 1233. 1.91MB 46.8
#> 2 sum(as.numeric(rlang::seq2(1, 1e+05… 861ns 984ns 898993. 0B 0
#> 3 sum(as.numeric(seq_len(1e+05))) 164ns 246ns 3139792. 0B 314.
Created on 2024-07-05 with reprex v2.1.0
System information
No response
I wasn't able to get to this today :cry:
seq(1, vcount(graph)) does not work for empty vectors.
Alternatives are: seq_along, seq_len.
library(bench)
n <- 1
graph <- make_ring(n)
bench:: mark(
"seq" = seq(1, vcount(graph)),
"along" = seq_along(V(graph)),
"len" = seq_len(vcount(graph))
)
The above fails if n = 0.
vcount() is faster then V(g).
Yeah, my seq_numeric() would special-case for the n == 0 edge case.
What is the point of a special function seq_numeric()?
Isn't it simpler to replace seq(1, x) with seq_len(x)?
I also wonder: how bad is it if we omit the test index_is_natural_sequence()? How likely is it that the test does indeed pass?
seq(1, x) is much faster, as the benchmark shows.
Because the natural sequence is the default argument for many input functions, I suspect that this case is pretty frequent and worth optimizing for, but I haven't done any measurements.
Your benchmark seems to show that seq_len (246 ns) is much faster than seq (791 µs), exactly the opposite of what you say above @krlmlr. What am I missing?
🤦 🤦
You're right, of course. But the test isn't representative either:
bench::mark(
sum(seq(1, 10000000, 1)),
sum(as.numeric(rlang::seq2(1, 10000000))),
sum(as.numeric(seq_len(10000000)))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sum(seq(1, 1e+07, 1)) 77.6ms 93.34ms 10.6 191MB 16.0
#> 2 sum(as.numeric(rlang::seq2(1, 1… 902ns 1.02µs 800320. 0B 0
#> 3 sum(as.numeric(seq_len(1e+07))) 164ns 246.04ns 2861548. 0B 0
Created on 2024-07-08 with reprex v2.1.0
The second and third example probably uses Gauss's formula. Let's try var() :
bench::mark(
var(seq(1, 100000, 1)),
var(as.numeric(rlang::seq2(1, 100000))),
var(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 var(seq(1, 1e+05, 1)) 818µs 960µs 883. 1.92MB 32.0
#> 2 var(as.numeric(rlang::seq2(1, 1e+05… 434µs 496µs 2002. 781.3KB 26.9
#> 3 var(as.numeric(seq_len(1e+05))) 432µs 495µs 2016. 781.3KB 29.4
Created on 2024-07-08 with reprex v2.1.0
This is more realistic but also indicates that as.numeric(seq_len()) is faster. I stand corrected.
A thought: avoid seq() in any case.
I tried make_ring(1E5). Optimization saves ca. 50%, 3.72ms v.s. 6.42ms.
What do we want to replace with seq_len()?
After reviewing, I think as.numeric(seq_len(n)) is best.
I must say I'm confused as to what needs to be changed from https://github.com/igraph/rigraph/pull/1330/files
Especially as index_is_natural_sequence is defined differently in its two occurrences.
I had forgotten about https://github.com/igraph/rigraph/pull/1433