add multiplier keyword to unit vector
In https://discourse.mc-stan.org/t/a-better-unit-vector/26989/30 Seth Axen lays out the reasoning for adding an additional parameter which repels values away from 0.
The current implementation is equivalent to
data {
int<lower=0> N;
}
parameters {
vector[N] u_raw;
}
transformed parameters {
real r = dot_self(u_raw);
vector[N] u = u_raw / sqrt(r);
}
model {
target += -0.5 * r;
}
The proposal is to parameterize the unit vector as
data {
int<lower=0> N;
}
parameters {
vector[N] u_raw;
}
transformed parameters {
real r = dot_self(u_raw);
vector[N] u = u_raw / sqrt(r);
}
model {
target += 0.5 * (a * log(r) - r);
}
where a >= 0 and is given as a multiplier keyword. As Seth notes in the post, it corresponds to the chi distribution with a + 1 degrees of freedom. The user would see:
unit_vector<multiplier=a>[N] u;
Setting a = 0 is equivalent to what is currently in Stan and will still be the default when the multiplier keyword is not used.
How to select a?
From the same thread the issue manifests with smaller dimension sizes of the unit vector. Heuristically, I'm finding that setting a = N^(6/N) where N is the dimension of the unit-vector works well.
N a
[1,] 1 1.000000
[2,] 2 8.000000
[3,] 3 9.000000
[4,] 4 8.000000
[5,] 5 6.898648
[6,] 6 6.000000
[7,] 7 5.301146
[8,] 8 4.756828
[9,] 9 4.326749
[10,] 10 3.981072
Here's a test model for 2d unit vectors where divergences occur in the current parameterization. When the data size, M, is small there are fewer divergences then when it's larger (M > 50).
data {
int<lower=0> M;
vector<lower=-pi(), upper=pi()>[M] y;
real a;
}
parameters {
vector[2] u_raw;
real<lower=0> kappa;
}
transformed parameters {
real r = dot_self(u_raw);
vector[2] u = u_raw / sqrt(r);
real mu = atan2(u[2], u[1]);
}
model {
target += 0.5 * (a * log(r) - r);
kappa ~ exponential(1);
y ~ von_mises(mu, kappa);
}
The data for this can be created as
M <- 100
y <- runif(M, min = -pi/4, max = pi/4)
true_mean <- 0
mod_unit <- cmdstan_model("unit.stan")
mod_unit_out <- mod_unit$sample(
data = list(M = M,
y = y,
a = 2^(6/2),
seed = 2309423,
parallel_chains = 4
)
If I'm reading this correctly, I think using the keyword multiplier for this would be a mistake. I think that would be interpreted by most users as some sort of scaling on vector itself, such that e.g unit_vector<multiplier=4> would yield a vector that has norm 4
From the comment
This repels
yfromy = 0, and the degree of repulsion can be tuned by the user by increasinga.
unit_vector<repulsion=...>?
From the code, a is added to the jacobian.
target += -r^2/2 + a * log(r);
Since it's not applied to the parameters idt, bias, scale, or multiplier fit right with the rest of the constrain keywords.
I'm ok with it just being something simple like $k$ as in unit_vector<k=...> we note that this is related to the chisquare distribution degrees of freedom. The log-pdf of the chisquare distribution is
$$ \left(\frac{k}{2} - 1 \right) \log(x) - \frac{x}{2} + C $$
What I have above is 0.5 * (a * log(r) - r). Clearly, $r == x$ thus
$$ \begin{aligned} \frac{a}{2} &= \frac{k}{2} - 1 \ k &= a + 2. \end{aligned} $$
When a = 0 this corresponds to 2 degrees of freedom, ie the sum of the square of two standard normal variates. When k is allowed to be non-integer values then this is a gamma RV with
$$ f(x) = \Gamma \left( \frac{k}{2}, \frac{1}{2} \right) $$
The mean of this is $\mu = k / 4$ and the variance is $\sigma^2 = k / 8$. As $k$ increases both the mean and the variance increase. But this doesn't effect the estimate of the unit_vector (unless sampling just fails because we're at the singularity).