posterior Complex numbers

Summary

This is a draft PR to support complex numbers in draws and rvar data types. It closes #319. The main changes, plus some points of conversation:

Adjust two checks in as_draws.R to allow complex values (this is the main thing that buys us support in draws formats)
Provide an implementation of sd(), variance(), and var() for complex values. This fixes printing issues, especially with rvars. The implementation of variance() follows the definition of the variance of a complex random variable as the sum of the variances of its real and imaginary components; and its sd is taken as the square root of this.
Ensure other rvar functions work with complex numbers, or (in cases of functions that should not work with complex numbers), raise an error. Functions that do not work with complex numbers include:
- [rvar_]min/max/range - since complex numbers are not orderable, these are not defined
- [rvar_]median - base R will compute a median of a complex number by taking the medians of each component; however, I'm not sure sure if that's correct (or if something like a geometric median would make more sense). One might argue that if someone just wanted the point estimates of the medians for each component, as in summarise_draws(), the base R approach could make sense. Conservatively I've let this throw an error, but would be curious thoughts.
- Base R quantile() on complex numbers seems (like median) to just treat each component separately, which I think is wrong, but again might be okay for us (in rvar_quantile and [rvar_]quantile2) if people just want the separate point estimates for the two components. I'm not sure.
- [rvar_]density, [rvar_]cdf - if implemented, these should (I think) be joint densities/cdfs for the two components, i.e. using 2d ecdfs / 2d density estimators. Absent that implementation, for now, these throw an error.
I'm not sure what to do about convergence metrics. At the moment, most fail because of a check in is_constant() that requires min / max, which are not defined on complex numbers. However, some succeed but I think give incorrect answers: e.g. rhat() appears to work, but I think returns incorrect values, as it relies on rank() in z_scale(), but base R's rank() ranks complex numbers in lexicographic order, when in reality it should probably return an error. I'm not sure if there's a good definition of any of these convergence metrics for complex numbers (maybe something like max rhat of both components?) or if these should all just return errors.

The main motivating example for this, inspired by @WardBrian's comment, does work now:

> x = cbind(real = rvar_rng(rnorm, 5), imag = rvar_rng(rnorm, 5, 1))
> x[,"real"] + x[,"imag"] * 1i
rvar<4000>[5,1] mean ± sd:
     real              
[1,]  0.00+0.98i ± 1.4 
[2,] -0.01+0.98i ± 1.4 
[3,]  0.01+0.97i ± 1.4 
[4,]  0.00+1.01i ± 1.4 
[5,]  0.01+0.99i ± 1.4

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

Nov 20 '23 04:11 mjskay

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (9059111) 95.52% compared to head (8a371ac) 95.48%. Report is 20 commits behind head on master.

:exclamation: Current head 8a371ac differs from pull request most recent head 31f82f8. Consider uploading reports for the commit 31f82f8 to get more accurate results

Files	Patch %	Lines
R/pareto_smooth.R	82.60%	8 Missing :warning:
R/weight_draws.R	81.81%	4 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #321      +/-   ##
==========================================
- Coverage   95.52%   95.48%   -0.05%     
==========================================
  Files          47       47              
  Lines        3691     3789      +98     
==========================================
+ Hits         3526     3618      +92     
- Misses        165      171       +6

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Nov 20 '23 04:11 codecov-commenter

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c97b1dead857aa1daa161fe7e6600840a0cbb0e6 is merged into master:

:ballot_box_with_check:as_draws_array: 101ms -> 101ms [-0.68%, +1.65%]
:rocket:as_draws_df: 33ms -> 31.6ms [-6.25%, -2.21%]
:rocket:as_draws_list: 182ms -> 180ms [-2.26%, -0.14%]
:ballot_box_with_check:as_draws_matrix: 29.3ms -> 29.1ms [-3.99%, +2.73%]
:ballot_box_with_check:as_draws_rvars: 153ms -> 154ms [-0.26%, +1.94%]
:ballot_box_with_check:summarise_draws_100_variables: 708ms -> 710ms [-0.42%, +1.11%]
:ballot_box_with_check:summarise_draws_10_variables: 78.4ms -> 78.4ms [-0.39%, +0.38%] Further explanation regarding interpretation and methodology can be found in the documentation.

Nov 20 '23 04:11 github-actions[bot]

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f70d6b9c29ee491afd317cd2de1e8fbc145088d1 is merged into master:

:rocket:as_draws_array: 103ms -> 101ms [-2.07%, -0.74%]
:ballot_box_with_check:as_draws_df: 31.9ms -> 32.2ms [-0.34%, +2.33%]
:ballot_box_with_check:as_draws_list: 184ms -> 185ms [-1.92%, +2.06%]
:exclamation::snail:as_draws_matrix: 29.1ms -> 32.1ms [+4.52%, +16%]
:ballot_box_with_check:as_draws_rvars: 156ms -> 158ms [-1.16%, +3.85%]
:ballot_box_with_check:summarise_draws_100_variables: 720ms -> 719ms [-1.75%, +1.52%]
:ballot_box_with_check:summarise_draws_10_variables: 79ms -> 79.4ms [-0.08%, +1.25%] Further explanation regarding interpretation and methodology can be found in the documentation.

Nov 20 '23 05:11 github-actions[bot]

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a76290c83e950cc4a26fdec6628fb70c043e50ea is merged into master:

:ballot_box_with_check:as_draws_array: 100ms -> 105ms [-5.1%, +14.36%]
:exclamation::snail:as_draws_df: 32.8ms -> 34ms [+2.3%, +5.21%]
:ballot_box_with_check:as_draws_list: 185ms -> 184ms [-1.28%, +0.49%]
:ballot_box_with_check:as_draws_matrix: 29.3ms -> 29.2ms [-2.04%, +1.58%]
:ballot_box_with_check:as_draws_rvars: 153ms -> 155ms [-0.15%, +2.04%]
:exclamation::snail:summarise_draws_100_variables: 713ms -> 719ms [+0.08%, +1.64%]
:ballot_box_with_check:summarise_draws_10_variables: 78.7ms -> 79.1ms [-0.24%, +1.43%] Further explanation regarding interpretation and methodology can be found in the documentation.

Nov 20 '23 05:11 github-actions[bot]

Does this handle the automatic re-shaping if I pass something that has colnames like x.1.real,...? Note that it is slightly subtle to do this correctly, since you end up with essentially strided draws in most ways of arranging these

Nov 20 '23 15:11 WardBrian

If I understand correctly, you're asking if it matters where in the order of dimensions the dimension indexing the real and imaginary parts is? I don't think it should make a difference; array slicing should handle that.

e.g. here's an rvar where the second dimension indexes the real/imaginary parts:

x = rvar(array(
  1:96, dim = c(4,4,2,3), 
  dimnames = list(NULL, NULL, c("real", "imag"), NULL)
))
x
#> rvar<4>[4,2,3] mean ± sd:
#> , , 1
#> 
#>      real        imag       
#> [1,]  2.5 ± 1.3  18.5 ± 1.3 
#> [2,]  6.5 ± 1.3  22.5 ± 1.3 
#> [3,] 10.5 ± 1.3  26.5 ± 1.3 
#> [4,] 14.5 ± 1.3  30.5 ± 1.3 
#> 
#> , , 2
#> 
#>      real        imag       
#> [1,] 34.5 ± 1.3  50.5 ± 1.3 
#> [2,] 38.5 ± 1.3  54.5 ± 1.3 
#> [3,] 42.5 ± 1.3  58.5 ± 1.3 
#> [4,] 46.5 ± 1.3  62.5 ± 1.3 
#> 
#> , , 3
#> 
#>      real        imag       
#> [1,] 66.5 ± 1.3  82.5 ± 1.3 
#> [2,] 70.5 ± 1.3  86.5 ± 1.3 
#> [3,] 74.5 ± 1.3  90.5 ± 1.3 
#> [4,] 78.5 ± 1.3  94.5 ± 1.3

Showing it as a draws_df shows the corresponding variable names in most formats:

as_draws_df(x)
#> # A draws_df: 4 iterations, 1 chains, and 24 variables
#>   x[1,real,1] x[2,real,1] x[3,real,1] x[4,real,1] x[1,imag,1] x[2,imag,1]
#> 1           1           5           9          13          17          21
#> 2           2           6          10          14          18          22
#> 3           3           7          11          15          19          23
#> 4           4           8          12          16          20          24
#>   x[3,imag,1] x[4,imag,1]
#> 1          25          29
#> 2          26          30
#> 3          27          31
#> 4          28          32
#> # ... with 16 more variables
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}

We can still just slice the two components and sum them. We could also do x[,"real",] + x[,"imag",] * 1i (note the extra commas), but it's not necessary as rvar fills in missing dimensions on the end during a slice:

x[,"real"] + x[,"imag"] * 1i
#> rvar<4>[4,1,3] mean ± sd:
#> , , 1
#> 
#>      real         
#> [1,]  2+18i ± 1.8 
#> [2,]  6+22i ± 1.8 
#> [3,] 10+26i ± 1.8 
#> [4,] 14+30i ± 1.8 
#> 
#> , , 2
#> 
#>      real         
#> [1,] 34+50i ± 1.8 
#> [2,] 38+54i ± 1.8 
#> [3,] 42+58i ± 1.8 
#> [4,] 46+62i ± 1.8 
#> 
#> , , 3
#> 
#>      real         
#> [1,] 66+82i ± 1.8 
#> [2,] 70+86i ± 1.8 
#> [3,] 74+90i ± 1.8 
#> [4,] 78+94i ± 1.8

If you don't want the (now-useless) "real" dim, you can drop it:

drop(x[,"real"] + x[,"imag"] * 1i)
#> rvar<4>[4,3] mean ± sd:
#>      [,1]          [,2]          [,3]         
#> [1,]  2+18i ± 1.8  34+50i ± 1.8  66+82i ± 1.8 
#> [2,]  6+22i ± 1.8  38+54i ± 1.8  70+86i ± 1.8 
#> [3,] 10+26i ± 1.8  42+58i ± 1.8  74+90i ± 1.8 
#> [4,] 14+30i ± 1.8  46+62i ± 1.8  78+94i ± 1.8

Which also looks right if we again convert to a format like draws_df:

as_draws_df(drop(x[,"real"] + x[,"imag"] * 1i))
#> # A draws_df: 4 iterations, 1 chains, and 12 variables
#>   x[1,1] x[2,1] x[3,1] x[4,1] x[1,2] x[2,2] x[3,2] x[4,2]
#> 1  1+17i  5+21i  9+25i 13+29i 33+49i 37+53i 41+57i 45+61i
#> 2  2+18i  6+22i 10+26i 14+30i 34+50i 38+54i 42+58i 46+62i
#> 3  3+19i  7+23i 11+27i 15+31i 35+51i 39+55i 43+59i 47+63i
#> 4  4+20i  8+24i 12+28i 16+32i 36+52i 40+56i 44+60i 48+64i
#> # ... with 4 more variables
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}

Nov 21 '23 02:11 mjskay

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 8a371acabb8d9e8023655c14ced677bd48af429c is merged into master:

:ballot_box_with_check:as_draws_array: 102ms -> 101ms [-2.4%, +0.52%]
:ballot_box_with_check:as_draws_df: 34ms -> 31.9ms [-19.2%, +6.81%]
:ballot_box_with_check:as_draws_list: 187ms -> 186ms [-2.78%, +2.06%]
:ballot_box_with_check:as_draws_matrix: 29.7ms -> 29.3ms [-2.98%, +0.79%]
:ballot_box_with_check:as_draws_rvars: 158ms -> 158ms [-2.47%, +2.2%]
:ballot_box_with_check:summarise_draws_100_variables: 711ms -> 712ms [-0.64%, +0.86%]
:ballot_box_with_check:summarise_draws_10_variables: 78.4ms -> 78.5ms [-0.22%, +0.49%] Further explanation regarding interpretation and methodology can be found in the documentation.

Nov 24 '23 23:11 github-actions[bot]

posterior posterior copied to clipboard

Complex numbers

Summary

Copyright and Licensing

Codecov Report

posterior
posterior copied to clipboard