BlackBoxOptim.jl
BlackBoxOptim.jl copied to clipboard
Int overflow in ϵ_index
So I thought I've been hitting the issue in #78 for a while, but the solutions there weren't really working for me. I clamped all of my outputs and fitness function to [0, 1e8] and was still having problems.
This floor call is actually the problem.
@inline function ϵ_index(u::F, ϵ::F, ::Type{Val{true}}) where F
if isnan(u)
return (typemax(Int), zero(F))
else
u_div_ϵ = clamp(u/ϵ, convert(F, typemin(Int)), convert(F, typemax(Int)))
ix = floor(Int, u_div_ϵ+10eps(F))
return (ix, max(zero(F), u_div_ϵ-ix))
end
end
When clamping u_div_ϵ
, there's the possibility you'll get u_div_ϵ = convert(F, typemax(Int))
. On the next line, adding the eps()
certainly doesn't help, but even so: floor(Int, convert(F,typemax(Int)))
is going to fail since the float is larger than the safe float conversion value.
julia> maxintfloat(Float64)
9.007199254740992e15
julia> convert(Float64,typemax(Int))
9.223372036854776e18
My fix for this is too ugly, hence this issue rather than a PR. I'm just not versed enough in these conversion problems.
ix = convert(Int, min(floor(Int128, u_div_ϵ+10eps(F)), typemax(Int)))
In other words: allow the float to spill over into typemax(Int64)+1
, then artificially bring it back into anInt64
range. Seems like a bit of a hack to me, and not really generalisable for 32 bit machines.
This issue happens using floor in the minimizing scheme, but I see the maximizing scheme uses ceil in a similar implementation. Do you get this issue using ceil as well?
Yes, that would be the case. So it looks like both of the inline ϵ_index functions are in need of attention in that regard.
Ok, thanks for noting this and also proposing a PR. I agree this is a "fix" but problem is that it doesn't really solve the fundamental issue that all very large Float values will be "squashed" into one and the same epsilon box. For very large fitness spaces it might make more sense to use some kind of varying (rather than fixed) box width. Hmm, not sure what is the best way forward here since we might need to "warn" users if this kind of "squashing" of the fitness space is happening (often). Ideas?
It's not nice, but the discussion in #78 suggested using a larger Int
to start with. Perhaps rather than default this change: have a keyword to set the box size (type), remove any ϵ_index clamping with a hard fail when it overflows. This would allow users with large fitness requirements to increase their range (I assume sacrificing either memory or performance or both), but keeps the default case unchanged.
Clamping happens at present, so with or without this edge case protection some form of warning / error / documentation should be considered. I'd have to look into the algorithm more to make any decent suggestion on that front.
Yes, the problem is there already but before deciding on an intermediate fix to some aspect of it I want to explore if there is a more general solution here. I think you are right that the default fitness scheme used should fail when it overflows. Then we could add another fitness scheme (which is basically the current one) which does the clamping. Then a (knowledgeable) user could choose to either increase the epsilon (to get larger boxes) or use the current, clamping fitness scheme.
Longer-term we could consider adding a varying width fitness scheme. However, I haven't seen any research on this and the effects seem hard to predict if the optimum really is up on the large side of the Float range.
An alternative design to all of this might be to try to avoid the conversion from Floats to Ints at all. However, this would affect other code and probably we'd pay a performance prize when maintaining the Pareto front / boxes (i.e. basically having to use hashes or tree traversal (based on the Float box values) rather than direct indexing using Ints).
Hmm, tricky. @alyst if you have additional ideas, please share.
Ok, I merged the PR with a simple fix for this now, however I did add a warning and corresponding tests for it. Please check and test if this is ok. I will also need to investigate so this doesn't have any larger effects on performance, though I doubt it. See: https://github.com/robertfeldt/BlackBoxOptim.jl/commit/3b3bfb9e1055c98cbf1dc343e805acda59a77806
And this commit adds a test case showing how a higher epsilon value avoids the warning:
https://github.com/robertfeldt/BlackBoxOptim.jl/commit/013c1e0cf8bffed965df0df9f393bf9e2e4b7dd3