rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

Use `as.numeric(seq_len(n))` in favor of `seq(1, n)`

Open krlmlr opened this issue 1 year ago • 10 comments

What happens, and what did you expect instead?

https://github.com/igraph/rigraph/pull/1330#discussion_r1665753951

We also need the by argument to ensure that the output is a numeric.

Perhaps add a function seq_numeric() that has a special case for zero-length output?

To reproduce

seq(by = 1) seems to be the fastest, look at the units. Maybe it's worth arguing with R core to support seq(start, start - by, by) without error and to return an empty vector in this case?

x <- seq(1, 1, 1)
class(x)
#> [1] "numeric"
x
#> [1] 1
seq(1, 0, 1)
#> Error in seq.default(1, 0, 1): wrong sign in 'by' argument

bench::mark(
  sum(seq(1, 100000, 1)),
  sum(as.numeric(rlang::seq2(1, 100000))),
  sum(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 sum(seq(1, 1e+05, 1))                628µs  791µs     1233.    1.91MB     46.8
#> 2 sum(as.numeric(rlang::seq2(1, 1e+05… 861ns  984ns   898993.        0B      0  
#> 3 sum(as.numeric(seq_len(1e+05)))      164ns  246ns  3139792.        0B    314.

Created on 2024-07-05 with reprex v2.1.0

System information

No response

krlmlr avatar Jul 05 '24 05:07 krlmlr

I wasn't able to get to this today :cry:

maelle avatar Jul 05 '24 14:07 maelle

seq(1, vcount(graph)) does not work for empty vectors. Alternatives are: seq_along, seq_len.

library(bench)
n <- 1
graph <- make_ring(n)
bench:: mark(
  "seq"   = seq(1, vcount(graph)),
  "along" = seq_along(V(graph)),
  "len"   = seq_len(vcount(graph))
)

The above fails if n = 0. vcount() is faster then V(g).

clpippel avatar Jul 08 '24 12:07 clpippel

Yeah, my seq_numeric() would special-case for the n == 0 edge case.

krlmlr avatar Jul 08 '24 13:07 krlmlr

What is the point of a special function seq_numeric()? Isn't it simpler to replace seq(1, x) with seq_len(x)?

clpippel avatar Jul 08 '24 13:07 clpippel

I also wonder: how bad is it if we omit the test index_is_natural_sequence()? How likely is it that the test does indeed pass?

clpippel avatar Jul 08 '24 13:07 clpippel

seq(1, x) is much faster, as the benchmark shows.

Because the natural sequence is the default argument for many input functions, I suspect that this case is pretty frequent and worth optimizing for, but I haven't done any measurements.

krlmlr avatar Jul 08 '24 14:07 krlmlr

Your benchmark seems to show that seq_len (246 ns) is much faster than seq (791 µs), exactly the opposite of what you say above @krlmlr. What am I missing?

szhorvat avatar Jul 08 '24 14:07 szhorvat

🤦 🤦

You're right, of course. But the test isn't representative either:

bench::mark(
  sum(seq(1, 10000000, 1)),
  sum(as.numeric(rlang::seq2(1, 10000000))),
  sum(as.numeric(seq_len(10000000)))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sum(seq(1, 1e+07, 1))             77.6ms  93.34ms      10.6     191MB     16.0
#> 2 sum(as.numeric(rlang::seq2(1, 1…   902ns   1.02µs  800320.         0B      0  
#> 3 sum(as.numeric(seq_len(1e+07)))    164ns 246.04ns 2861548.         0B      0

Created on 2024-07-08 with reprex v2.1.0

The second and third example probably uses Gauss's formula. Let's try var() :

bench::mark(
  var(seq(1, 100000, 1)),
  var(as.numeric(rlang::seq2(1, 100000))),
  var(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 var(seq(1, 1e+05, 1))                818µs  960µs      883.    1.92MB     32.0
#> 2 var(as.numeric(rlang::seq2(1, 1e+05… 434µs  496µs     2002.   781.3KB     26.9
#> 3 var(as.numeric(seq_len(1e+05)))      432µs  495µs     2016.   781.3KB     29.4

Created on 2024-07-08 with reprex v2.1.0

This is more realistic but also indicates that as.numeric(seq_len()) is faster. I stand corrected.

krlmlr avatar Jul 08 '24 14:07 krlmlr

A thought: avoid seq() in any case.

clpippel avatar Jul 08 '24 16:07 clpippel

I tried make_ring(1E5). Optimization saves ca. 50%, 3.72ms v.s. 6.42ms.

clpippel avatar Jul 08 '24 17:07 clpippel

What do we want to replace with seq_len()?

maelle avatar Mar 25 '25 13:03 maelle

After reviewing, I think as.numeric(seq_len(n)) is best.

krlmlr avatar Apr 03 '25 09:04 krlmlr

I must say I'm confused as to what needs to be changed from https://github.com/igraph/rigraph/pull/1330/files

Especially as index_is_natural_sequence is defined differently in its two occurrences.

maelle avatar Apr 09 '25 14:04 maelle

I had forgotten about https://github.com/igraph/rigraph/pull/1433

maelle avatar Apr 09 '25 15:04 maelle