LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

Fix DockerFile GPU CUDA GPG Keys

Open Arka161 opened this issue 2 years ago • 9 comments

The Docker in the repository no longer works, as NVIDIA updated their GPG keys in Apr 2022.

The dockerfile has been updated here

Arka161 avatar Jul 12 '22 17:07 Arka161

Hello @jameslamb ,

Thank you for your comment on the PR. I understand that master -> master PRs are not encouraged for this Repo. Would you like me to close this PR, and then make a new one with a different branch pointing the LightGBM/master?

Also, I would like to highlight that the current Docker image does not work, hence my changes to the GPG key. Let me know if you would like me to change anything :)

Arka161 avatar Jul 14 '22 21:07 Arka161

then make a new one with a different branch

nope, it's ok! You can keep this one. But once this is merged, I recommend deleting and re-creating your fork. (or using git reset + git push --force to rewrite history on master of your fork)

jameslamb avatar Jul 14 '22 21:07 jameslamb

Thank you for your reply, I understand what you meant! I did not have any other code modified in the origin of my LightGBM fork, which is why I had made it from master. In the future, I will make the PR from a branch created out of master/main :)

For testing Docker images: One quick idea is trying it on Google Colab / Free GPU instances on Kaggle, I have tested it on my environment with an RTX 2060 + official NVIDIA drivers (Ubuntu), and it seems to work great, I cannot think of any way to write tests for docker images.

Arka161 avatar Jul 15 '22 05:07 Arka161

I personally don't like this line the Dockerfile I had to add (looks too hardcoded):

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC

Though it errors out without it on any system due to NVIDIA deprecating its keys.

Edit: I have removed the hardcoded condition to syntax NVIDIA recommends on their blog post, to make it more maintainable. @jameslamb

Arka161 avatar Jul 15 '22 05:07 Arka161

Edit: I have removed the hardcoded condition to syntax NVIDIA recommends on their blog post

Here it is: https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/.

StrikerRUS avatar Jul 24 '22 14:07 StrikerRUS

@jameslamb I have made the changes, I am very sorry for the delay as I was travelling. Making the other pull request too (I guess it is better to make it after this one is merged).

Arka161 avatar Aug 04 '22 02:08 Arka161

No problem! Thanks very much.

I started testing this on a g4dn instance on AWS last night. Didn't quite finish, will try in the next few days.

@Arka161 could you clarify what specifically you mean by "The Docker in the repository no longer works"? For example, does it fail to build? fail at runtime? something else?

When I was testing on an AWS EC2 instance last night, I was able to build an image from docker/gpu/dockerfile.gpu on latest master, without any modifications.

jameslamb avatar Aug 04 '22 20:08 jameslamb

Hello @jameslamb : I get this kind of an error when I run the GPU Dockerfile on the master branch of LightGBM.

Basically, something like:

W: GPG error: ___ trusty InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY ___

I got this on my server and my personal Ubuntu machine, and I was unable to build the image. It appeared that NVIDIA changed their CUDA Linux GPG keys, which was the fix for the error.

The error gets fixed with the lines I added where I update the GPG keys as per their suggestions in their blog (using their cuda-keyring). This way, we will always have this Dockerfile up to date and never face GPG key issues on any machine.

Arka161 avatar Aug 04 '22 20:08 Arka161

Ok thanks for that. I saw that exact same warning (that is what the W: means), but was still able to build the image on Amazon Linux.

~Are you working in an environment where anything writing to stderr causes a process to exit with an error? Like on Windows using Powershell?~ Sorry, just saw your comment says "Ubuntu".

Anyway, I'll come back to this some time in the next few days, once I've had a chance to finish testing, with a reproducible example and a more thorough review. Thanks for your patience.

jameslamb avatar Aug 04 '22 20:08 jameslamb

I'm going to close this PR based on https://github.com/microsoft/LightGBM/pull/5369#pullrequestreview-1064356007 and due to lack of response.

@Arka161 if you see an issue with the conclusions in that comment please do leave a comment here and we can re-open this pull request.

Thanks for your interest in LightGBM, come back and contribute any time!

jameslamb avatar Aug 16 '22 01:08 jameslamb

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

github-actions[bot] avatar Aug 19 '23 03:08 github-actions[bot]