math add multiplier keyword to unit vector

In https://discourse.mc-stan.org/t/a-better-unit-vector/26989/30 Seth Axen lays out the reasoning for adding an additional parameter which repels values away from 0.

The current implementation is equivalent to

data {
  int<lower=0> N;
}
parameters {
  vector[N] u_raw;
}
transformed parameters {
  real r = dot_self(u_raw);
  vector[N] u = u_raw / sqrt(r);
}
model {
  target += -0.5 * r;
}

The proposal is to parameterize the unit vector as

data {
  int<lower=0> N;
}
parameters {
  vector[N] u_raw;
}
transformed parameters {
  real r = dot_self(u_raw);
  vector[N] u = u_raw / sqrt(r);
}
model {
 target += 0.5 * (a * log(r) - r);
}

where a >= 0 and is given as a multiplier keyword. As Seth notes in the post, it corresponds to the chi distribution with a + 1 degrees of freedom. The user would see:

unit_vector<multiplier=a>[N] u;

Setting a = 0 is equivalent to what is currently in Stan and will still be the default when the multiplier keyword is not used.

How to select `a`?

From the same thread the issue manifests with smaller dimension sizes of the unit vector. Heuristically, I'm finding that setting a = N^(6/N) where N is the dimension of the unit-vector works well.

         N a
 [1,]    1 1.000000
 [2,]    2 8.000000
 [3,]    3 9.000000
 [4,]    4 8.000000
 [5,]    5 6.898648
 [6,]    6 6.000000
 [7,]    7 5.301146
 [8,]    8 4.756828
 [9,]    9 4.326749
[10,]   10 3.981072

Here's a test model for 2d unit vectors where divergences occur in the current parameterization. When the data size, M, is small there are fewer divergences then when it's larger (M > 50).

data {
  int<lower=0> M;
  vector<lower=-pi(), upper=pi()>[M] y;
  real a;
}
parameters {
  vector[2] u_raw;
  real<lower=0> kappa;
}
transformed parameters {
  real r = dot_self(u_raw);
  vector[2] u = u_raw / sqrt(r);
  real mu = atan2(u[2], u[1]);
}
model {
  target += 0.5 * (a * log(r) - r);
  kappa ~ exponential(1);
  y ~ von_mises(mu, kappa);
}

The data for this can be created as

M <- 100
y <- runif(M, min = -pi/4, max = pi/4)
true_mean <- 0

mod_unit <- cmdstan_model("unit.stan")

mod_unit_out <- mod_unit$sample(
  data = list(M = M,
              y = y,
              a = 2^(6/2),
  seed = 2309423,
  parallel_chains = 4
)

Sep 20 '24 09:09 spinkney

If I'm reading this correctly, I think using the keyword multiplier for this would be a mistake. I think that would be interpreted by most users as some sort of scaling on vector itself, such that e.g unit_vector<multiplier=4> would yield a vector that has norm 4

Sep 20 '24 19:09 WardBrian

From the comment

This repels y from y = 0, and the degree of repulsion can be tuned by the user by increasing a.

unit_vector<repulsion=...>?

From the code, a is added to the jacobian. target += -r^2/2 + a * log(r);

Since it's not applied to the parameters idt, bias, scale, or multiplier fit right with the rest of the constrain keywords.

Sep 20 '24 20:09 SteveBronder

I'm ok with it just being something simple like $k$ as in unit_vector<k=...> we note that this is related to the chisquare distribution degrees of freedom. The log-pdf of the chisquare distribution is

$$ \left(\frac{k}{2} - 1 \right) \log(x) - \frac{x}{2} + C $$

What I have above is 0.5 * (a * log(r) - r). Clearly, $r == x$ thus

$$ \begin{aligned} \frac{a}{2} &= \frac{k}{2} - 1 \ k &= a + 2. \end{aligned} $$

When a = 0 this corresponds to 2 degrees of freedom, ie the sum of the square of two standard normal variates. When k is allowed to be non-integer values then this is a gamma RV with

$$ f(x) = \Gamma \left( \frac{k}{2}, \frac{1}{2} \right) $$

The mean of this is $\mu = k / 4$ and the variance is $\sigma^2 = k / 8$. As $k$ increases both the mean and the variance increase. But this doesn't effect the estimate of the unit_vector (unless sampling just fails because we're at the singularity).

Sep 21 '24 18:09 spinkney

add multiplier keyword to unit vector

How to select a?

How to select `a`?