math icon indicating copy to clipboard operation
math copied to clipboard

`log_inv_logit_diff` is incorrect for infinite inputs

Open vandenman opened this issue 1 year ago • 3 comments

Description

The composed function log_inv_logit_diff(x, y) does not return the same value as log(inv_logit(x) - inv_logit(y)) when x is infinite.

If this is expected behavior, feel free to close and ignore this issue!

Example

Below some R code to reproduce the difference

# master at July 21st, 2022
# direct url https://github.com/rok-cesnovar/misc/blob/master/expose-cmdstanr-functions/expose_cmdstanr_functions.R
source("https://raw.githubusercontent.com/rok-cesnovar/misc/f92f70feb59f1fb2f2978e00f22df945402300dd/expose-cmdstanr-functions/expose_cmdstanr_functions.R")

# some values with infinities
x <- c(Inf, 0.727602, -0.382391)
y <- c(-2.34361, -0.429584, -Inf)

model_code <- '
functions{
  void test_log_inv_logit_diff(vector x, vector y) {

      vector [rows(x)] il_x = inv_logit(x);
      vector [rows(x)] il_y = inv_logit(y);

      print("x");
      print(x);
      print("y");
      print(y);
      print("il_x");
      print(il_x);
      print("il_y");
      print(il_y);
      print("il_x - il_y");
      print(il_x - il_y);
      print("log(il_x - il_y)");
      print(log(il_x - il_y));
      print("log_inv_logit_diff(x, y)");
      print(log_inv_logit_diff(x, y));
      print("log_inv_logit_diff(y, x)");
      print(log_inv_logit_diff(y, x));

  }
}'

model_path <- cmdstanr::write_stan_file(model_code)
udfs <- expose_cmdstanr_functions(model_path)

# R
log(plogis(x) - plogis(y))
#> [1] -0.09164942 -1.27277586 -0.90251025
# Stan
udfs$test_log_inv_logit_diff(x, y)
#> x
#> [inf,0.727602,-0.382391]
#> y
#> [-2.34361,-0.429584,-inf]
#> il_x
#> [1,0.674279,0.40555]
#> il_y
#> [0.087575,0.394226,0]
#> il_x - il_y
#> [0.912425,0.280053,0.40555]
#> log(il_x - il_y)
#> [-0.0916494,-1.27278,-0.90251]
#> log_inv_logit_diff(x, y)
#> [-nan,-1.27278,-0.90251] # <- the first entry here is NaN instead of -0.0916494
#> log_inv_logit_diff(y, x)
#> [nan,nan,nan]

Created on 2022-07-21 by the reprex package (v2.0.1)

Expected Output

There should be no difference between log_inv_logit_diff and doing it manually (except perhaps better performance and numerical accuracy for log_inv_logit_diff).

Current Version:

v4.4.0 (<- not sure what to fill in here)

cmdstan v2.30.0 cmdstanr v0.5.3 https://github.com/stan-dev/cmdstanr/tree/22b391e68c9577bafcc0ae0721d8dc32a14e341b

vandenman avatar Jul 21 '22 08:07 vandenman

Ah thanks for catching this. The nan pops up because an in Inf x argument to log_inv_logit_diff causes an evaluation of Inf - Inf, which resolves to nan rather than 0.

I'll add a check for this and ensure a proper return

andrjohns avatar Jul 21 '22 10:07 andrjohns

Note that the second case:

#> log_inv_logit_diff(y, x)
#> [nan,nan,nan]

Is correct, since the difference in the inverse logits is negative and so their log is not defined:

> x <- c(Inf, 0.727602, -0.382391)
> y <- c(-2.34361, -0.429584, -Inf)
> plogis(y) - plogis(x)
[1] -0.9124250 -0.2800532 -0.4055503

andrjohns avatar Jul 21 '22 11:07 andrjohns

Sorry about that, the second case is indeed correct. I added it because initially, I was unsure whether log_inv_logit_diff did x - y or y - x.

Thanks for fixing this so quickly!

vandenman avatar Jul 21 '22 11:07 vandenman

Closed with merge of #2798

spinkney avatar Mar 07 '23 21:03 spinkney