BSON.jl Error when saving Flux models/CuArrays from GPU

Saved Flux models or CuArrays while they are in GPU Memory can only be loaded again in the same julia session. Once this session is terminated and a new session is started, loading this data will either result in random values or in a CUDA error. This can make trained and saved models useless since most of the time they will be loaded in a new julia session... MWE (for CuArrays):

using BSON: @save, @load
using CUDAdrv
using CuArrays
using Flux

data = [1 2 3; 4 5 6]
data = data |> gpu 
@show data
@save "data.bson" data
@load "data.bson" data
@show data

this gives me the correct output:

data = Float32[1.0 2.0 3.0; 4.0 5.0 6.0]
data = Float32[1.0 2.0 3.0; 4.0 5.0 6.0]
2×3 CuArray{Float32,2}:
 1.0  2.0  3.0
 4.0  5.0  6.0

Loading the data in a new session

using BSON: @load
using CUDAdrv
using CuArrays
using Flux

@load "data.bson" data
@show data

will result in an error:

ERROR: CUDA error: invalid argument (code #1, ERROR_INVALID_VALUE)
Stacktrace:
 [1] macro expansion at /home/user/.julia/packages/CUDAdrv/WVU1H/src/base.jl:147 [inlined]
 [2] #copy!#10(::Nothing, ::Bool, ::Function, ::Ptr{Float32}, ::CUDAdrv.Mem.DeviceBuffer, ::Int64) at /home/user/.julia/packages/CUDAdrv/WVU1H/src/memory.jl:344
 [3] copy! at /home/user/.julia/packages/CUDAdrv/WVU1H/src/memory.jl:335 [inlined]
 [4] copyto!(::Array{Float32,2}, ::Int64, ::CuArray{Float32,2}, ::Int64, ::Int64) at /home/user/.julia/packages/CuArrays/PwSdF/src/array.jl:194
 [5] show(::Base.GenericIOBuffer{Array{UInt8,1}}, ::CuArray{Float32,2}) at /home/user/.julia/packages/GPUArrays/fAX0Q/src/abstractarray.jl:101
 [6] #sprint#340(::Nothing, ::Int64, ::Function, ::Function, ::CuArray{Float32,2}) at ./strings/io.jl:101
 [7] #sprint at ./none:0 [inlined]
 [8] #repr#341 at ./strings/io.jl:208 [inlined]
 [9] repr(::CuArray{Float32,2}) at ./strings/io.jl:208
 [10] top-level scope at show.jl:555

The error occurs during the show command and not during loading!

I experienced the same issue when I tried to save Flux models. Saving and loading worked without errors but the loaded model had not the trained weights but random values.

The documentation of Flux only says that GPU support needs to be available when loading models which where in GPU memory when saved.

Sep 11 '19 11:09 theimperior

Right, you can't save CuArrays with BSON.jl. Doing data = data |> Flux.cpu before saving your model should fix this (of course, when you load it again it will just be a regular array, not a CuArray).

Sep 11 '19 13:09 jpsamaroo

Right, you can't save CuArrays with BSON.jl. Okay, but then doing this should at least produce some error. Like in many other applications, no output (error/warning message) means everything went as expected! You can get in big trouble if you are not aware of this issue and try to save your model after a time consuming learning phase...

Doing data = data |> Flux.cpu before saving your model should fix this This is what I am doing now too, but I think the documentation of Flux should be more clear about this

Sep 23 '19 09:09 theimperior

Agreed. If you have time, it would be great if you can submit a Flux PR to make this very clear in the docs.

Sep 23 '19 14:09 jpsamaroo

Agreed, just had the same problem and it's written nowhere in the Flux documentation

Nov 29 '19 08:11 dominusmi

Sounds like this issue has been resolved?

Out of curiosity, it is possible to overload some method so that when someone tries to save CuArray to BSON, it copies the data into an Array and saves that? And maybe it's possible it's possible to save a tiny bit of metadata so that when loading a "CuArray" from disk, it creates an Array and copies the data over into a newly created CuArray?

Oct 30 '20 19:10 ali-ramadhan

I would say model = model |> gpu is not a solution because it corrupts the correspondence in stateful optimisers. For example the Adam optimiser uses an IdDict to keep track of the momentum for different params. After |> gpu the object ids change and the optimiser state has to start from scratch. This means in the end we are not able to resume training. The BSON saving and loading can only be used for training => saving and then loading => inference. Restarting training with a blank optimiser state would corrupt reproducability

Jan 13 '21 05:01 jonas-eschmann

If you have time, it would be great if you can submit a Flux PR to make this very clear in the docs.

@jpsamaroo a fellow student hit this bug last week, I'm thinking throw an info and automatically moved to CPU for them.

Where should this be, https://github.com/JuliaGPU/CUDA.jl/blob/603edb87891da8fd5b2623f17544aebe9706069a/src/array.jl#L68

unfortunately there's no interface package defining this type, so I'm thinking adding a Requires.jl here at BSON?

Oct 02 '22 18:10 Moelf

BSON.jl BSON.jl copied to clipboard

Error when saving Flux models/CuArrays from GPU

BSON.jl
BSON.jl copied to clipboard