MLJBase.jl
MLJBase.jl copied to clipboard
Julia crashes for multithreaded Stack for some non-Julia models
Context: #767 adds support for an option acceleration=CPUThreads()
in composite model types defined by "exporting" learning networks, and implements this option for Stack
. I have been carrying out MLJ ecosystem integration tests of the new Stack
with a large number of models as base models in the stack. If the base model is one from the non-Julia packages ScikitLearn.jl, XGBoost.jl, or LIBSVM.jl, and I am including CPUThreads()
in the testing, then I am experiencing Julia crashes. I not been able to reliably reproduce the crashes with a "minimal example" but the follow seems to do the job on my machine:
using Pkg
Pkg.activate(temp=true)
Pkg.add(
url="https://github.com/JuliaAI/MLJBase.jl",
rev="stack_cache_and_acceleration",
)
Pkg.add(
url = "https://github.com/JuliaAI/MLJTestIntegration.jl",
rev= "multi-threading",
)
Pkg.add("NearestNeighborModels")
Pkg.add("MLJLIBSVMInterface")
Pkg.add("XGBoost")
Pkg.instantiate()
julia> Pkg.status()
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_wRKoZO/Project.toml`
[a7f614a8] MLJBase v0.20.2 `https://github.com/JuliaAI/MLJBase.jl#stack_cache_and_acceleration`
[61c7150f] MLJLIBSVMInterface v0.2.0
[697918b4] MLJTestIntegration v0.1.0 `https://github.com/JuliaAI/MLJTestIntegration.jl#multi-threading`
[636a865e] NearestNeighborModels v0.2.0
[009559a3] XGBoost v1.5.2
using MLJBase
using NearestNeighborModels
using MLJLIBSVMInterface
using MLJTestIntegration
using XGBoost
model = EpsilonSVR()
models = (knn1=KNNRegressor(K=4),
knn2=KNNRegressor(K=6),
model=model)
metalearner = KNNRegressor()
measure = LPLoss(2)
# mini Boston:
y, X = unpack(MLJBase.load_boston(), ==(:MedV), col->col in [:LStat, :Rm])
data = (X, y)
mystack = Stack(
; metalearner,
resampling=CV(;nfolds=3),
acceleration=CPUThreads(),
models...)
julia> MLJTestIntegration.test_single_target_regressors(
[(name="EpsilonSVR", package_name="LIBSVM"),],
level=4,
verbosity=2
)
┌ Info:
└ Testing EpsilonSVR from LIBSVM
[ Info: [:model_type] Loading model type ✓
[ Info: [:model_instance] Instantiating default model ✓
[ Info: [:fitted_machine] Fitting machine ✓
[ Info: [:operations] Calling `predict`, `transform` and/or `inverse_transform` ✓
[ Info: [evaluation] Evaluating model performance using with 1 resources. ✓
Internal repeatability tests, 50 of 50 trials complete ✓ Repeatable.
[ Info: Testing with 5 threads.
[ Info: [:accelerated_evaluation] Evaluating model performance using with 2 resources. ✓
[ Info: [:tuned_pipe_evaluation] Evaluating perfomance in a tuned pipeline ✓
[ Info: [:ensemble_prediction] Ensembling ✓
[ Info: [stack_evaluation] Evaluating a stack containing model with 1 resources. ✓
signal (11): Segmentation fault: 11
in expression starting at /Users/anthony/sandbox/crash.jl:43
signal (11): Segmentation fault: 11
in expression starting at /Users/anthony/sandbox/crash.jl:43
signal (11): Segmentation fault: 11
in expression starting at /Users/anthony/sandbox/crash.jl:43
signal (11): Segmentation fault: 11
in expression starting at /Users/anthony/sandbox/crash.jl:43
unknown function (ip: 0x10b82aca3)
Allocations: 279946573 (Pool: 279865905; Big: 80668); GC: 248
signal (11): Segmentation fault: 11
in expression starting at /Users/anthony/sandbox/crash.jl:43
unknown function (ip: 0x10b80f59c)
Allocations: 279946573 (Pool: 279865905; Big: 80668); GC: 248
...
Interestingly, if I remove MLJXGBoostInterface from the env, and the using XGBoost
, then there are no issues and the tests pass.
I do not seem to have problems with any pure Julia models.
In attempts to isolate, I have encountered various errors, such as:
OMP: Error #13: Assertion failure at kmp_csupport.cpp(540).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
signal (6): Abort trap: 6
in expression starting at REPL[2]:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 105303846 (Pool: 105260636; Big: 43210); GC: 106
julia(70986,0x70000783d000) malloc: *** error for object 0x7ff0725333e0: pointer being freed was not allocated
julia(70986,0x70000783d000) malloc: *** set a breakpoint in malloc_error_break to debug
signal (6): Abort trap: 6
in expression starting at /Users/anthony/sandbox/crash.jl:46
signal (11): Segmentation fault: 11
in expression starting at /Users/anthony/sandbox/crash.jl:46
Allocations: 279191441 (Pool: 279111122; Big: 80319); GC: 222
julia(90542,0x7000079c6000) malloc: Incorrect checksum for freed object 0x7f8da2b121a8: probably modified after being freed.
Corrupt value: 0x7f8da2b1b4c0
julia(90542,0x7000079c6000) malloc: *** set a breakpoint in malloc_error_break to debug
signal (6): Abort trap: 6
in expression starting at /Users/anthony/MLJ/MLJTestIntegration/examples/bigtest/notebook.jl:35
signal (4): Illegal instruction: 4
in expression starting at /Users/anthony/MLJ/MLJTestIntegration/examples/bigtest/notebook.jl:35
I am running with 5 threads.
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
JULIA_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
JULIA_EGLOT_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
JULIA_NUM_THREADS = 5
JULIA_NIGHTLY_PATH = /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia
Interesting - I get these problems (intermittently) as well on an M1 mac with non-Julia models (XGBoost, LightGBM, etc) - but I get it when I do cross validation (calling evaluate
) with multi-threading enabled. It is similarly hard for me to generate a minimal example but I get the same exceptions / seg faults that you do.
Same thing here, a simple loop with only an SVM in the Stack produces the error on my side if that helps:
metalearner = EpsilonSVR()
models = (model=EpsilonSVR(),)
mystack = Stack(
; metalearner,
resampling=CV(;nfolds=3),
cache=false,
acceleration=CPUThreads(),
models...)
for i in 1:3
fitresult,_, _ = fit(mystack, 0, X, y)
end
I noticed LIBSVM also has internal multithreading, could that be related?
It appears LIBSVM isn't thread safe https://github.com/JuliaML/LIBSVM.jl/issues/60