Flux.jl icon indicating copy to clipboard operation
Flux.jl copied to clipboard

Hoping to offer a version without cuda

Open NeutronT opened this issue 2 years ago • 12 comments

Motivation and description

Hoping to offer a version without cuda, cpu only. Just like Pytorch and Tensorflow.

Possible Implementation

No response

NeutronT avatar Jan 07 '23 09:01 NeutronT

this will happen with https://github.com/FluxML/Flux.jl/pull/2132

CarloLucibello avatar Jan 07 '23 10:01 CarloLucibello

Can you elaborate on why you want this? Unlike TF and PyTorch, Flux runs just fine on machines without CUDA-enabled GPUs and even functionality like moving arrays to GPU memory will gracefully fall back to being no-ops. I don't think it's an apples-to-apples comparison with the Python libraries.

ToucheSir avatar Jan 07 '23 15:01 ToucheSir

Can you elaborate on why you want this? Unlike TF and PyTorch, Flux runs just fine on machines without CUDA-enabled GPUs and even functionality like moving arrays to GPU memory will gracefully fall back to being no-ops. I don't think it's an apples-to-apples comparison with the Python libraries.

IMO installing CUDA takes a lot of time and memory. On a machine without CUDA, or on a project where I just want to fit a very small simple custom function, all those time and memory consumption will be for nothing.

ndgnuh avatar Apr 08 '23 09:04 ndgnuh

Do you mean CUDA as in Nvidia's CUDA libraries or CUDA.jl, the Julia package? I make this distinction, because I believe CUDA.jl will only download Nvidia's library when the very first GPU-related function is called. In other words, you shouldn't be installing CUDA unless you have a CUDA capable GPU + you put something on the GPU. On a CPU machine, this should never happen calling just Flux functions.

If it does happen, then maybe it would be easier to patch that one case than releasing a CPU only package.

darsnack avatar Apr 08 '23 13:04 darsnack

But what if I don't want to install CUDA.jl even if I have a Nvidia card on my machine. I just want to use Flux to fit a very simple function. The tradeoff between runtime and install time is not very good imo.

On my fresh installation, CUDA artifacts are still downloaded.

image

Also, this is ran with CUDA_VISIBLE_DEVICES=""

ndgnuh avatar Apr 10 '23 09:04 ndgnuh

Something may have recently changed on the CUDA.jl end? I see this just installing it without Flux. You may want to bring it up on their side.

ToucheSir avatar Apr 10 '23 14:04 ToucheSir

Can you elaborate on why you want this? Unlike TF and PyTorch, Flux runs just fine on machines without CUDA-enabled GPUs and even functionality like moving arrays to GPU memory will gracefully fall back to being no-ops. I don't think it's an apples-to-apples comparison with the Python libraries.

"GPU" no longer just means CUDA. There's support for AMD GPUs, and developing support for Apple MPS GPUs in Metal.jl. I was able to get a simple model to run on the GPU on my M2 MacBook Air using Metal with some very simple changes to the code for Flux (only proof-of-concept at this point, which is why I've not pushed it to my fork of Flux.jl). The hard dependency on CUDA needs to be removed (and it sounds like this is in progress).

Another thing: the dependency on CUDA also makes it impossible to install up-to-date versions of other packages that have a (legacy, IMO) dependence on CUDA, e.g. TensorOperations.jl.

rgobbel avatar Apr 16 '23 16:04 rgobbel

I don't think anyone has argued otherwise. The reason Flux has CUDA.jl listed under [deps] is purely technical: prior to Julia 1.9, it was not possible to declare a package as an optional dependency and include code for it (excluding Requires.jl, which would not have worked here). Kyle's point is that CUDA.jl was changed to mitigate the impact of this dependency by only installing the relevant libraries when explicitly requested. Hence unlike PyTorch or TF, there is just one Flux package/build and things should gracefully degrade when no compatible GPUs are available.

The hard dependency on CUDA needs to be removed (and it sounds like this is in progress).

Again, this is a technical limitation. You can already see we've added support for AMDGPU.jl in Flux and NNlib as conditionally-loaded package extensions, but those only work on 1.9. Converting the CUDA.jl integration would be a breaking change and cut off support for v1.6-1.8, which Flux promises to support. In the meantime, I imagine Metal.jl support will be added as a package extension too, because users of that library will already be used to relying on newer Julia versions. Again, let me emphasize that the current CUDA.jl dep is not as "hard" as it seems on the surface because of the aforementioned graceful degradation of functionality.

Another thing: the dependency on CUDA also makes it impossible to install up-to-date versions of other packages that have a (legacy, IMO) dependence on CUDA, e.g. TensorOperations.jl.

I'm not sure I understand, can you clarify this point? The problem seems to be that NNlibCUDA relies on CUDA.jl v4, and TensorOperations relies on CUDA.jl v3. Based on https://discourse.julialang.org/t/question-about-new-package-extensions/92794/7, I'm not sure that'll be solved even after moving to package extensions. I've asked the GPU folks about how best to address this, but note that the root cause isn't the hard dep.

ToucheSir avatar Apr 16 '23 17:04 ToucheSir

Something may have recently changed on the CUDA.jl end? I see this just installing it without Flux. You may want to bring it up on their side.

Based on this comment, it seems the move to jlls and using Pkg artifacts triggers more eager installation.

darsnack avatar Apr 17 '23 14:04 darsnack

@ndgnuh It seems you can prevent downloading artifacts by specifying defining a file called LocalPreferences.toml in the root directory of your project:

[CUDA_Runtime_jll]
version = "local"

You could also make this setting exported for the project or global to your system. This functionality is built on top of Preferences.jl.

darsnack avatar Apr 17 '23 14:04 darsnack

... the dependency on CUDA also makes it impossible to install up-to-date versions of other packages that have a (legacy, IMO) dependence on CUDA, e.g. TensorOperations.jl.

I'm not sure I understand, can you clarify this point? The problem seems to be that NNlibCUDA relies on CUDA.jl v4, and TensorOperations relies on CUDA.jl v3. Based on https://discourse.julialang.org/t/question-about-new-package-extensions/92794/7, I'm not sure that'll be solved even after moving to package extensions. I've asked the GPU folks about how best to address this, but note that the root cause isn't the hard dep.

After fiddling around with TensorOperations a bit, it looks like the problem is related to TensorOperations' dependence on CUTENSOR. I actually managed to refactor TensorOperations to make its CUDA support an extension, but wasn't able to test it on the only NVIDIA hardware I have available because for some reason CUTENSOR wasn't loading (I've been way out of the loop for a long time, so if there's a way for me to test using CI without actually doing a PR, please let me know). All other tests passed, BTW, but I don't really have much time to spend on that package, so for now I'm dropping it.

rgobbel avatar Apr 17 '23 20:04 rgobbel

@ndgnuh It seems you can prevent downloading artifacts by specifying defining a file called LocalPreferences.toml in the root directory of your project:

[CUDA_Runtime_jll]
version = "local"

You could also make this setting exported for the project or global to your system. This functionality is built on top of Preferences.jl.

After struggling for a bit, I realized that I still have to install CUDA and download all the CUDA stuff first.

Ref: https://github.com/JuliaPackaging/Preferences.jl/issues/53

~~Since my use case is fairly simple, I just worked around this by rolling my own childish version of GD with Zygote and without Flux :man_shrugging:. Code for anyone who needs it.~~

Edit: GD is not really efficient for this use case, the least-square algorithm is much more suitable. The package LsqFit.jl provides this.

GD code
function fit!(compute_loss::Function, params;
              η = 1.0f-5, num_epochs = 10, clip_grad = 0.0f0)
    for _ in 1:num_epochs
        grad = Zygote.gradient(compute_loss, params)
        if clip_grad > 0
            clamp!(grad[1], -clip_grad, clip_grad)
        end
        params = @. params - η * grad[1]
    end
    return params
end

# Usage
function line(x, params)
  a, b = params
  @. a * x + b
end
params = zeros(Float32, 2)
x = rand(-10f0:0.01f0:10f0, 100)
y = line(x, [3f0, 1f0])
fit!(params; num_epochs = 500, η = 1f-2, clip_grad=10) do params
  ŷ = line(x, params)
  mean(@. (ŷ - y)^2)
end

ndgnuh avatar Jun 03 '23 10:06 ndgnuh

This can probably be closed now that CUDA is no longer a hard dependency.

christiangnrd avatar Jul 25 '24 23:07 christiangnrd