VonMises failed to find valid initial parameters in 1000 tries.
Minimal working example
using Turing
@model function bmodel(αs)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
for i in eachindex(αs)
αs[i] ~ VonMises(μ, κ) # doesn't work
# αs[i] ~ Normal(μ, κ) # works
end
end
x = rand(VonMises(0.3, 3), 100)
model = bmodel(x)
chain = sample(model, NUTS(), 1000)
Description
When trying to fit the model above to the data using a Normal distribution, I get no issues, but if I switch that distribution to VonMises, it errors.
Julia version info
versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × AMD Ryzen 5 PRO 8540U w/ Radeon 740M Graphics
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Manifest
]st --manifest
Status `/tmp/jl_iET1Nu/Manifest.toml`
[47edcb42] ADTypes v1.14.0
[621f4979] AbstractFFTs v1.5.0
[80f14c24] AbstractMCMC v5.6.2
[7a57a42e] AbstractPPL v0.11.0
[1520ce14] AbstractTrees v0.4.5
[7d9f7c33] Accessors v0.1.42
[79e6a3ab] Adapt v4.3.0
⌅ [0bf59076] AdvancedHMC v0.7.1
[5b7e9947] AdvancedMH v0.8.7
[576499cb] AdvancedPS v0.6.2
⌅ [b5ca4192] AdvancedVI v0.2.12
[66dad0bd] AliasTables v1.1.3
[dce04be8] ArgCheck v2.5.0
[4fba245c] ArrayInterface v7.19.0
[a9b6321e] Atomix v1.1.1
[13072b0f] AxisAlgorithms v1.1.0
[39de3d68] AxisArrays v0.4.7
[198e06fe] BangBang v0.4.4
[9718e550] Baselet v0.1.1
[76274a88] Bijectors v0.15.7
[082447d4] ChainRules v1.72.4
[d360d2e6] ChainRulesCore v1.25.1
[0ca39b1e] Chairmarks v1.3.1
[9e997f8a] ChangesOfVariables v0.1.10
[861a8166] Combinatorics v1.0.3
[38540f10] CommonSolve v0.2.4
[bbf7d656] CommonSubexpressions v0.3.1
[34da2185] Compat v4.16.0
[a33af91c] CompositionsBase v0.1.2
[88cd18e8] ConsoleProgressMonitor v0.1.2
[187b0558] ConstructionBase v1.5.8
[a8cc5b0e] Crayons v4.1.1
[9a962f9c] DataAPI v1.16.0
[864edb3b] DataStructures v0.18.22
[e2d170a0] DataValueInterfaces v1.0.0
[244e2a9f] DefineSingletons v0.1.2
[8bb1440f] DelimitedFiles v1.9.1
[b429d917] DensityInterface v0.4.0
[163ba53b] DiffResults v1.1.0
[b552c78f] DiffRules v1.15.1
⌅ [a0c0ee7d] DifferentiationInterface v0.6.54
[31c24e10] Distributions v0.25.120
[ced4e74d] DistributionsAD v0.6.58
[ffbed154] DocStringExtensions v0.9.4
[366bfd00] DynamicPPL v0.36.10
[cad2338a] EllipticalSliceSampling v2.0.0
[4e289a0a] EnumX v1.0.5
[e2ba6199] ExprTools v0.1.10
[55351af7] ExproniconLite v0.10.14
[7a1cc6ca] FFTW v1.9.0
[9aa1b823] FastClosures v0.3.2
[1a297f60] FillArrays v1.13.0
[6a86dc24] FiniteDiff v2.27.0
⌅ [f6369f11] ForwardDiff v0.10.38
[069b7b12] FunctionWrappers v1.1.3
[77dc65aa] FunctionWrappersWrappers v0.1.3
[d9f16b24] Functors v0.5.2
[46192b85] GPUArraysCore v0.2.0
[076d061b] HashArrayMappedTries v0.2.0
[34004b35] HypergeometricFunctions v0.3.28
[22cec73e] InitialValues v0.3.1
⌅ [a98d9a8b] Interpolations v0.15.1
[8197267c] IntervalSets v0.7.11
[3587e190] InverseFunctions v0.1.17
[41ab1584] InvertedIndices v1.3.1
[92d709cd] IrrationalConstants v0.2.4
[c8e1da08] IterTools v1.10.0
[82899510] IteratorInterfaceExtensions v1.0.0
[692b3bcd] JLLWrappers v1.7.0
[682c06a0] JSON v0.21.4
[ae98c720] Jieko v0.2.1
[63c18a36] KernelAbstractions v0.9.34
[5ab0869b] KernelDensity v0.6.9
[5be7bae1] LBFGSB v0.4.1
[8ac3fa9e] LRUCache v1.6.2
[b964fa9f] LaTeXStrings v1.4.0
[1d6d02ad] LeftChildRightSiblingTrees v0.2.0
⌅ [6f1fad26] Libtask v0.8.8
[d3d80556] LineSearches v7.3.0
[6fdf6af0] LogDensityProblems v2.1.2
[996a588d] LogDensityProblemsAD v1.13.0
[2ab3a3ac] LogExpFunctions v0.3.29
[e6f89c97] LoggingExtras v1.1.0
⌃ [c7f686f2] MCMCChains v6.0.7
[be115224] MCMCDiagnosticTools v0.3.14
[e80e1ace] MLJModelInterface v1.11.1
[1914dd2f] MacroTools v0.5.16
[dbb5928d] MappedArrays v0.4.2
[128add7d] MicroCollections v0.2.0
[e1d29d7a] Missings v1.2.0
[2e0e35c7] Moshi v0.3.5
[d41bc354] NLSolversBase v7.9.1
[872c559c] NNlib v0.9.30
[77ba4419] NaNMath v1.1.3
[86f7a689] NamedArrays v0.10.4
[c020b1a1] NaturalSort v1.0.0
[6fe1bfb0] OffsetArrays v1.17.0
[429524aa] Optim v1.12.0
[3bd65402] Optimisers v0.4.6
[7f7a1694] Optimization v4.3.0
⌃ [bca83a33] OptimizationBase v2.7.0
[36348300] OptimizationOptimJL v0.4.3
[bac558e1] OrderedCollections v1.8.1
[90014a1f] PDMats v0.11.35
[d96e819e] Parameters v0.12.3
[69de0a69] Parsers v2.8.3
[85a6dd25] PositiveFactorizations v0.2.4
⌅ [aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.4.3
[08abe8d2] PrettyTables v2.4.0
[33c8b6b6] ProgressLogging v0.1.4
[92933f4c] ProgressMeter v1.10.4
[43287f4e] PtrArrays v1.3.0
[1fd47b50] QuadGK v2.11.2
[74087812] Random123 v1.7.1
[e6cf234a] RandomNumbers v1.6.0
[b3c3ace0] RangeArrays v0.3.2
[c84ed2f1] Ratios v0.4.5
[c1ae055f] RealDot v0.1.0
[3cdcf5f2] RecipesBase v1.3.4
[731186ca] RecursiveArrayTools v3.33.0
[189a3867] Reexport v1.2.2
[ae029012] Requires v1.3.1
[79098fc4] Rmath v0.8.0
[f2b01f46] Roots v2.2.7
[7e49a35a] RuntimeGeneratedFunctions v0.5.15
[26aad666] SSMProblems v0.5.0
[0bca4576] SciMLBase v2.96.0
⌃ [c0aeaf25] SciMLOperators v0.4.0
[53ae85a6] SciMLStructures v1.7.0
[30f210dd] ScientificTypesBase v3.0.0
[7e506255] ScopedValues v1.3.0
[efcf1570] Setfield v1.1.2
[a2af1166] SortingAlgorithms v1.2.1
[9f842d2f] SparseConnectivityTracer v0.6.18
[dc90abb0] SparseInverseSubset v0.1.2
[0a514795] SparseMatrixColorings v0.4.20
[276daf66] SpecialFunctions v2.5.1
[171d559e] SplittablesBase v0.1.15
[90137ffa] StaticArrays v1.9.13
[1e83bf80] StaticArraysCore v1.4.3
[64bff920] StatisticalTraits v3.4.0
[10745b16] Statistics v1.11.1
[82ae8749] StatsAPI v1.7.1
[2913bbd2] StatsBase v0.34.5
[4c63d2b9] StatsFuns v1.5.0
[892a3eda] StringManipulation v0.4.1
[09ab397b] StructArrays v0.7.1
[2efcf032] SymbolicIndexingInterface v0.3.40
[3783bdb8] TableTraits v1.0.1
[bd369af6] Tables v1.12.1
[5d786b92] TerminalLoggers v0.1.7
[9f7883ad] Tracker v0.2.38
[28d57a85] Transducers v0.4.84
[fce5fe82] Turing v0.38.4
[3a884ed6] UnPack v1.0.2
[013be700] UnsafeAtomics v0.3.0
[efce3f68] WoodburyMatrices v1.0.0
[700de1a5] ZygoteRules v0.2.7
[f5851436] FFTW_jll v3.3.11+0
[1d5cc7b8] IntelOpenMP_jll v2025.0.4+0
[81d17ec3] L_BFGS_B_jll v3.0.1+0
[856f044c] MKL_jll v2025.0.1+1
[efe28fd5] OpenSpecFun_jll v0.5.6+0
[f50d1b31] Rmath_jll v0.5.1+0
[1317d2d5] oneTBB_jll v2022.0.0+0
[0dad84c5] ArgTools v1.1.2
[56f22d72] Artifacts v1.11.0
[2a0f44e3] Base64 v1.11.0
[ade2ca70] Dates v1.11.0
[8ba89e20] Distributed v1.11.0
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching v1.11.0
[9fa8497b] Future v1.11.0
[b77e0a4c] InteractiveUtils v1.11.0
[4af54fe1] LazyArtifacts v1.11.0
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2 v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[56ddb016] Logging v1.11.0
[d6f4376e] Markdown v1.11.0
[a63ad114] Mmap v1.11.0
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.11.0
[de0858da] Printf v1.11.0
[3fa0cd96] REPL v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization v1.11.0
[1a1011a3] SharedArrays v1.11.0
[6462fe0b] Sockets v1.11.0
[2f01184e] SparseArrays v1.11.0
[f489334b] StyledStrings v1.11.0
[4607b0f0] SuiteSparse
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test v1.11.0
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.6.0+0
[e37daf67] LibGit2_jll v1.7.2+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.6+0
[14a3606d] MozillaCACerts_jll v2023.12.12
[4536629a] OpenBLAS_jll v0.3.27+1
[05823500] OpenLibm_jll v0.8.5+0
[bea87d4a] SuiteSparse_jll v7.7.0+0
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.11.0+0
[8e850ede] nghttp2_jll v1.59.0+0
[3f19e933] p7zip_jll v17.4.0+2
I'll add that I've simplified the MWE even further by replacing the multiple observations with a single observation:
using Turing
@model function bmodel(α)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
α ~ VonMises(μ, κ) # doesn't work
# α ~ Normal(μ, κ) # works
end
x = rand(VonMises(0.3, 3))
model = bmodel(x)
chain = sample(model, NUTS(), 1000)
It still errors with the same error:
┌ Warning: failed to find valid initial parameters in 10 tries; consider providing explicit initial parameters using the `initial_params` keyword
└ @ Turing.Inference ~/.julia/packages/Turing/GS2NO/src/mcmc/hmc.jl:160
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00
ERROR: failed to find valid initial parameters in 1000 tries. This may indicate an error with the model or AD backend; please open an issue at https://github.com/TuringLang/Turing.jl/issues
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] find_initial_params(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{…}, varinfo::DynamicPPL.VarInfo{…}, hamiltonian::AdvancedHMC.Hamiltonian{…}; max_attempts::Int64)
@ Turing.Inference ~/.julia/packages/Turing/GS2NO/src/mcmc/hmc.jl:170
[3] find_initial_params(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{…}, varinfo::DynamicPPL.VarInfo{…}, hamiltonian::AdvancedHMC.Hamiltonian{…})
@ Turing.Inference ~/.julia/packages/Turing/GS2NO/src/mcmc/hmc.jl:145
[4] initialstep(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}, vi_original::DynamicPPL.VarInfo{…}; initial_params::Nothing, nadapts::Int64, kwargs::@Kwargs{})
@ Turing.Inference ~/.julia/packages/Turing/GS2NO/src/mcmc/hmc.jl:210
[5] step(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{…}, spl::DynamicPPL.Sampler{…}; initial_params::Nothing, kwargs::@Kwargs{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/I9lST/src/sampler.jl:125
[6] step
@ ~/.julia/packages/DynamicPPL/I9lST/src/sampler.jl:108 [inlined]
[7] macro expansion
@ ~/.julia/packages/AbstractMCMC/kwj9g/src/sample.jl:161 [inlined]
[8] macro expansion
@ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
[9] (::AbstractMCMC.var"#25#26"{Bool, String, Nothing, Int64, Int64, Int64, Nothing, @Kwargs{…}, Random.TaskLocalRNG, DynamicPPL.Model{…}, DynamicPPL.Sampler{…}, Int64, Int64, Int64})()
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/kwj9g/src/logging.jl:12
[10] with_logstate(f::AbstractMCMC.var"#25#26"{…}, logstate::Base.CoreLogging.LogState)
@ Base.CoreLogging ./logging/logging.jl:522
[11] with_logger(f::Function, logger::LoggingExtras.TeeLogger{Tuple{LoggingExtras.EarlyFilteredLogger{…}, LoggingExtras.EarlyFilteredLogger{…}}})
@ Base.CoreLogging ./logging/logging.jl:632
[12] with_progresslogger(f::Function, _module::Module, logger::Base.CoreLogging.ConsoleLogger)
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/kwj9g/src/logging.jl:36
[13] macro expansion
@ ~/.julia/packages/AbstractMCMC/kwj9g/src/logging.jl:11 [inlined]
[14] mcmcsample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; progress::Bool, progressname::String, callback::Nothing, num_warmup::Int64, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{…})
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/kwj9g/src/sample.jl:144
[15] sample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{…}, sampler::DynamicPPL.Sampler{…}, N::Int64; chain_type::Type, resume_from::Nothing, initial_state::Nothing, progress::Bool, nadapts::Int64, discard_adapt::Bool, discard_initial::Int64, kwargs::@Kwargs{})
@ Turing.Inference ~/.julia/packages/Turing/GS2NO/src/mcmc/hmc.jl:117
[16] sample
@ ~/.julia/packages/Turing/GS2NO/src/mcmc/hmc.jl:86 [inlined]
[17] #sample#100
@ ~/.julia/packages/Turing/GS2NO/src/mcmc/abstractmcmc.jl:29 [inlined]
[18] sample
@ ~/.julia/packages/Turing/GS2NO/src/mcmc/abstractmcmc.jl:20 [inlined]
[19] #sample#99
@ ~/.julia/packages/Turing/GS2NO/src/mcmc/abstractmcmc.jl:17 [inlined]
[20] sample(model::DynamicPPL.Model{typeof(bmodel), (:α,), (), (), Tuple{Float64}, Tuple{}, DynamicPPL.DefaultContext}, alg::NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, N::Int64)
@ Turing.Inference ~/.julia/packages/Turing/GS2NO/src/mcmc/abstractmcmc.jl:14
[21] top-level scope
@ REPL[11]:1
Some type information was truncated. Use `show(err)` to see complete types.
This is not a Turing bug, likely a misspecified model, or simply that the posterior is difficult for HMC.
This may indicate an error with the model or AD backend; please open an issue at https://github.com/TuringLang/Turing.jl/issues
@penelopeysm, let's rephrase this message and replace the word "error" with something similar to my above comment. Also, let's redirect users to Julia discourse instead of opening issues here.
let's rephrase this message and replace the word "error"
This is not a Turing bug
The message doesn't claim that there's an error with Turing; it just says it's an error, which could well be a user error.
Would you like to write something that is closer to what you intend?
Also, let's redirect users to Julia discourse instead of opening issues here.
I personally like having people open issues. I don't think that GitHub issues should only be used for bugs with the libraries, I think it's perfectly fine for users to ask for help here.
@yakir12 On the topic of the model: A couple of days ago I actually wrote up a page to help debug this exact error :)
https://turinglang.org/docs/usage/troubleshooting/#initial-parameters
If you try out the first step:
using Turing
using DynamicPPL.TestUtils.AD: run_ad
@model function bmodel(α)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
α ~ VonMises(μ, κ)
end
x = rand(VonMises(0.3, 3))
model = bmodel(x)
adtype = AutoForwardDiff()
result = run_ad(model, adtype; test=false, benchmark=false)
result.grad_actual
You'll find that ForwardDiff is returning NaNs:
2-element Vector{Float64}:
NaN
NaN
I also tried ReverseDiff (returns one NaN instead of two), Mooncake (errors), and Enzyme (errors) so the root issue seems to be that Julia's AD packages don't handle VonMises well.
It's likely not a user error either. Some models are difficult for MCMC. Thus, HMC could fail. A better message could look like
"This may indicate a misspecification of the model, or the posterior is difficult for the specified HMC sampler. "
I personally like having people open issues. I don't think that GitHub issues should only be used for bugs with the libraries, I think it's perfectly fine for users to ask for help here.
These issues are more likely modelling-oriented than Turing-specific. Another reason for using Julia's discourse is that more people there can help, and the discussions are more discoverable than here.
I see that, but this error happens even before HMC starts sampling, it's simply trying to find an initial starting point with valid logp and valid gradients.
This code path has been reported 5 times, and 3 of those times, it's an AD problem, and would have been trivially caught by run_ad:
https://github.com/TuringLang/Turing.jl/issues/2584 https://github.com/TuringLang/Turing.jl/issues/2526 https://github.com/TuringLang/Turing.jl/issues/2389 -- Not the same error message, but it used to hang, and I fixed it by making it error instead.
1 time, it was probably a modelling problem, but ultimately boiled down to AD, and would also have been caught by run_ad:
https://github.com/TuringLang/Turing.jl/issues/2559
1 time, it was the model being a problem (although arguably our initialisation also isn't flexible enough):
https://github.com/TuringLang/Turing.jl/issues/2476
@penelopeysm the above errors are indeed about HMC initialisation, but they are for different models.
Briefly, the VonMises distribution has support depending on its parameters (i.e., mu and kappa). This will instruct Turing to construct a TruncatedBijector, which will change when mu and kappa change. This canonical example breaks NUTS because its warmup procedure assumes constant support for RVs.
I posted a comment above (maybe it got lost!) that this issue is also about AD: https://github.com/TuringLang/Turing.jl/issues/2584#issuecomment-2943478873
I did see your comment. It is a numerical edge case for AD, which is likely still caused by model misspecification. So, I suggested rephrasing the warning message above.
More generally, even with great effort, making AD or bijectors consistently numerically stable is impossible due to constraints inherited from the finite precision assumption of numerical computing.
@yakir12, for your example, you could try slice sampling instead of HMC, which works quite well for low-dimensional posterior distributions (up to a few tens of parameters):
using Turing, SliceSampling
@model function bmodel(αs)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
for i in eachindex(αs)
αs[i] ~ VonMises(μ, κ) # doesn't work
# αs[i] ~ Normal(μ, κ) # works
end
end
x = rand(VonMises(0.3, 3), 100)
model = bmodel(x)
chain = sample(model, externalsampler(RandPermGibbs(SliceSteppingOut(2.))), 1000)
Random walk MH might also work well, but slightly less user-friendly than slice sampling because you need to tune the proposal to get good convergence.
@yebai I am sorry but I am struggling to understand -- if the problem is a model misspecification, surely the solution is to change the model?
As I see it, it's an AD issue, and using another sampler like slice sampling or MH works precisely because it sidesteps the use of AD entirely. So, to me, it seems that the error message is quite accurate, and the only thing I would change about the error message is to add a link to the docs.
I see the point about Discourse being more widely seen, though I must mention that GitHub is a much better place for us developers. For example, the fact that people open issues here is precisely what helps us see the typical underlying causes of this error. If an issue was better solved on Discourse, I wouldn't have any qualms with closing an issue and asking people to post on Discourse. I would rather have it this way, instead of people posting bugs on Discourse and me having to look on there to find out.
(If we add the link to the docs, I'd be ok with removing the suggestion to open an issue.)
First and foremost, thank you for the rapid and thoughtful responses, it's obvious that everyone here really cares about this package.
Secondly, I tried your suggestion @yebai and it sill errored but with the following:
julia> chain = sample(model, externalsampler(RandPermGibbs(SliceSteppingOut(2.))), 1000)
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:03
ERROR: Exceeded maximum number of proposal 10000, which indicates an acceptance rate less than 0.01%. A quick fix is to increase `max_prop`, but an acceptance rate that is too low often indicates that there is a problem. Here are some possible causes:
- The model might be broken or degenerate (most likely cause).
- The tunable parameters of the sampler are suboptimal.
- The initialization is pathologic. (try supplying a (different) `initial_params`)
- There might be a bug in the sampler. (if this is suspected, file an issue to `SliceSampling`)
Hi @yakir12, I suspect your model creates a fairly peaky distribution, which is difficult for both HMC and slice sampling. You can look at the slice sampling package. The following setup works well in my experiemnts:
julia> using Turing, SliceSampling, StableRNGs
julia> rng = StableRNG(1)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000003)
julia> @model function bmodel(αs)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
for i in eachindex(αs)
αs[i] ~ VonMises(μ, κ) # doesn't work
# αs[i] ~ Normal(μ, κ) # works
end
end
bmodel (generic function with 2 methods)
julia> x = rand(rng, VonMises(0.3, 3), 100);
julia> model = bmodel(x);
julia> chain = sample(rng, model, externalsampler(RandPermGibbs(SliceSteppingOut(2.))), 1000)
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00
Chains MCMC chain (1000×3×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
Wall duration = 0.67 seconds
Compute duration = 0.67 seconds
parameters = μ, κ
internals = lp
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
μ 0.2946 0.0822 0.0026 886.1154 563.6224 0.9996 1330.5037
κ 3.1686 9.0720 0.2840 856.6130 789.4449 0.9994 1286.2056
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
μ 0.1647 0.2507 0.2915 0.3375 0.4244
κ 2.2440 2.6427 2.8721 3.1141 3.6225
Although finding a more specialised algorithm to work with such posteriors might be possible, it is often an indicator that your model is mis-specified or the posterior is nearly a delta distribution due to abundant data.
I'll have a look whenever I have the time as well 👍
@yebai: The following "works" (but actually, the likelihood term is not computed correctly).
The original failure seems to be due to Distributions.jl (correctly I must assume) calling besselix, which I must assume messes up AD.
using Turing
"HOTIFX - DOES NOT COMPUTE THE NORMALIZATION CONSTANT - WILL NOT WORK CORRECTLY"
function Turing.VonMises(μ::T, κ::T; check_args::Bool=true) where {T <: Real}
return Turing.VonMises{T}(μ, κ, 1.)
end
@model function bmodel(α)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
α ~ VonMises(μ, κ) # works - BUT THE LIKELIHOOD TERM IS WRONG
# α ~ Normal(μ, κ) # works
end
x = rand(VonMises(0.3, 3))
model = bmodel(x)
chain = sample(model, NUTS(), 1000)
@yakir12: the original posterior (with x = rand(VonMises(0.3, 3), 100) and the VonMises likelihood) samples without issues via Stan. Do note though that in general these kinds of models can be problematic to fit, due the inherent difficulty of mapping something that wraps around ([-pi,+pi] % 2pi) to the real line. Potentially, you'd get a bimodal posterior if the posterior mass is concentrated at +/- pi.
Failure to sample from this posterior (via Turing.jl) seems to be due to some besselix+AD issue (as @penelopeysm actually also showed here). I'm guessing an issue has to be filed somewhere else as well about this. I do wonder where, who'd know, @AoifeHughes or @penelopeysm? Could someone do it? :)
The original failure seems to be due to Distributions.jl (correctly I must assume) calling besselix, which I must assume messes up AD.
SpecialFunctions has rules for besselix but is not automatically loaded by ForwardDiff. ForwardDiffChainRules.jl can help load these ChainRulesCore rules.
It is likely easier to write a Mooncake rule for Distributions.logpdf(::VonMises, ...) instead of loading ChainRulesCore rule for besselix, which doesn't return gradient w.r.t its first argument.
So to summarise where we've gotten to via the above examples and help with @mhauru the Von Mises distribution issue seems to have be because of: automatic differentiation of Von Mises logpdf with respect to its parameters (μ, κ) produces NaN gradients, causing HMC/NUTS sampling to fail.
Please someone correct me if any of this is wrong, and I've misunderstood.
Minimal Working Examples
1. Von Mises Model (Fails)
using Turing
@model function bmodel_single(α)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
α ~ VonMises(μ, κ) # This doesn't work
end
x_single = rand(VonMises(0.3, 3))
model_single = bmodel_single(x_single)
chain = sample(model_single, NUTS(), 1000)
Output:
┌ Warning: failed to find valid initial parameters in 10 tries; consider providing explicit initial parameters using the `initial_params` keyword
└ @ Turing.Inference
Error: ErrorException("failed to find valid initial parameters in 1000 tries. See https://turinglang.org/docs/uri/initial-parameters for common causes and solutions. If the issue persists, please open an issue at https://github.com/TuringLang/Turing.jl/issues")
2. Normal Distribution Model (Works)
@model function bmodel_normal(αs)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
for i in eachindex(αs)
αs[i] ~ Normal(μ, κ) # This works
end
end
x = rand(VonMises(0.3, 3), 100)
model_normal = bmodel_normal(x)
chain_normal = sample(model_normal, NUTS(), 1000)
Output:
┌ Info: Found initial step size
│ ϵ = 0.2
└ @ Turing.Inference
Sampling 100%|██████████████████████████████████████████| Time: 0:00:02
Chains MCMC chain (1000×16×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
parameters = μ, κ
Autodiff Diagnostic
using DynamicPPL.TestUtils.AD: run_ad
@model function bmodel_debug(α)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
α ~ VonMises(μ, κ)
end
x_debug = rand(VonMises(0.3, 3))
model_debug = bmodel_debug(x_debug)
result = run_ad(model_debug, AutoForwardDiff(); test=false, benchmark=false)
Output:
┌ Info: Running AD on bmodel_debug with AutoForwardDiff()
└ @ DynamicPPL.TestUtils.AD
params : [0.4319840091648042, -1.0811701470988007]
actual : (-7.543558639200898, [NaN, NaN])
Gradient result: [NaN, NaN]
Analysis
The issue is not with:
- Model specification
- Bijectors
- Parameter initialization logic
- General ForwardDiff compatibility
The issue is with:
- Von Mises
logpdfparameter differentiation: When ForwardDiff attempts to compute derivatives oflogpdf(VonMises(μ, κ), x)with respect to μ and κ, it returns NaN - This causes NaN gradients during HMC/NUTS sampling
- Results in "failed to find valid initial parameters" because all gradient-based proposals fail
- This suggests missing or incorrect derivative rules for Von Mises in Distributions.jl
Still to do (for me)
- Check Von Mises
logpdfimplementation in Distributions.jl for parameter differentiation issues
...
Looking at https://github.com/JuliaStats/Distributions.jl it looks like vonmises.jl struct pre-computes I0kx = besselix(0, k) at construction time and stores it as a field. When logpdf uses -log(d.I0x), autodiff sees this as -log(constant) with respect to k, losing the dependency chain?
Ok so I've dug into this further with help from @mhauru's suggestion, and found the issue! It's the precomputed besselix value in the Von Mises struct that breaks autodiff.
Looking at the Von Mises implementation in Distributions.jl, the struct stores a precomputed value:
struct VonMises{T<:Real} <: ContinuousUnivariateDistribution
μ::T # mean
κ::T # concentration
I0κx::T # I0(κ) * exp(-κ), precomputed <-- THIS IS THE PROBLEM
end
When you create a Von Mises distribution, it precomputes besselix(0, κ) and stores it. But when autodiff tries to differentiate the logpdf with respect to κ, it treats this stored value as a constant, breaking the derivative chain.
Here's a minimal test showing the issue:
using Distributions, DifferentiationInterface, SpecialFunctions
# Test the current Distributions.jl implementation
result = DifferentiationInterface.value_and_derivative(
μ -> Distributions.logpdf(VonMises(μ, 1.0), 0.0),
AutoForwardDiff(),
0.2
)
println("Current implementation: ", result)
Output:
Current implementation: (-1.0937248470752825, NaN)
The derivative is NaN, but if we compute besselix on the fly:
# Fixed version: compute besselix directly, don't precompute
function fixed_logpdf(μ::Real, κ::Real, x::Real)
return κ * (cos(x - μ) - 1) - log(besselix(0, κ)) - log(2π)
end
fixed_result = DifferentiationInterface.value_and_derivative(
μ -> fixed_logpdf(μ, 1.0, 0.0),
AutoForwardDiff(),
0.2
)
println("Fixed version: ", fixed_result)
Output:
Fixed version: (-1.0937248470752823, -0.19866933079506122)
The fixed version gives us a proper derivative instead of NaN!
So it would look like the (one possible?) solution is to modify Von Mises in Distributions.jl to compute besselix(0, κ) directly in the logpdf function instead of storing it as a precomputed field. This trades a tiny bit of performance for autodiff compatibility, which is essential for MCMC sampling. (I dunno enough about bijectors or other thoughts from above to know if they'd work?)
Should I open a PR to Distributions.jl with this change or something similar to it?
But when autodiff tries to differentiate the logpdf with respect to κ, it treats this stored value as a constant, breaking the derivative chain.
AFAICT that's not the problem, and most likely precomputing the value is not a general problem either. Based on the example it seems that the problem is rather that derivatives appear in the besselix terms when using the constructor - most likely because all parameters are promoted to a common type, and hence instead of operating with a κ::Float64 you're operating with a κ::ForwardDiff.Dual (with zero partial derivatives) when using the constructor. Whereas in your custom logpdf implementation ForwardDiff operates with κ::Float64 and hence no problems with partials show up.
Did you try enabling NaN-safe mode in ForwardDiff?
yup, NaN-safe mode works:
julia> using ForwardDiff, Distributions
Precompiling ForwardDiff...
1 dependency successfully precompiled in 2 seconds. 19 already precompiled.
julia> f(x) = logpdf(VonMises(0.0, x[1]), 0.4)
f (generic function with 1 method)
julia> ForwardDiff.gradient(f, [1.0]) # with nansafe
1-element Vector{Float64}:
0.47467102810635065
julia> ForwardDiff.gradient(f, [1.0]) # without nansafe
1-element Vector{Float64}:
NaN
Sampling with nansafe mode enabled does run but eventually gives a different numerical error, although I'm not inclined to spend time tracking it down
julia> @model function bmodel(α)
μ ~ Uniform(-π, π)
κ ~ InverseGamma(2, 3)
α ~ VonMises(μ, κ) # doesn't work
# α ~ Normal(μ, κ) # works
end
bmodel (generic function with 2 methods)
julia> x = rand(VonMises(0.3, 3))
0.5225302010283244
julia> model = bmodel(x)
DynamicPPL.Model{typeof(bmodel), (:α,), (), (), Tuple{Float64}, Tuple{}, DynamicPPL.DefaultContext}(bmodel, (α = 0.5225302010283244,), NamedTuple(), DynamicPPL.DefaultContext())
julia> chain = sample(model, NUTS(), 1000)
┌ Info: Found initial step size
└ ϵ = 0.8
Sampling 100%|███████████████████████████████████████████████████████████████████| Time: 0:00:04
ERROR: AmosException with id 4: input argument magnitude too large, complete loss of accuracy by argument reduction.
Stacktrace:
[1] _besseli(nu::Float64, z::ComplexF64, kode::Int32)
@ SpecialFunctions ~/.julia/packages/SpecialFunctions/mf0qH/src/bessel.jl:260
[2] besselix
@ ~/.julia/packages/SpecialFunctions/mf0qH/src/bessel.jl:400 [inlined]
[3] besselix
@ ~/.julia/packages/SpecialFunctions/mf0qH/src/bessel.jl:504 [inlined]
[4] besselix
@ ~/.julia/packages/ForwardDiff/2nvFM/src/dual.jl:285 [inlined]
[...]
The numerical approximations used by the besselix implementation (or rather the C libraries Julia calls) break down if the arguments become too large, and then an error is thrown. I assume you could avoid it by eg truncating kappa.