machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

How can I tell if ML.NET is using GPU acceleration?

Open CBrauer opened this issue 4 years ago • 6 comments

Hey Guys,

I have a question about GPU support in ML.NET.

My environment is: • Visual Studio 2022 (Preview) • .NET 6.0 (Preview)

I have downloaded and installed “ML.NET Model Builder GPU Support” and tested the GPU with Model Builder. Model Builder likes my GPU card as you can see:

GPU Support

It works o.k.

I have some legacy ML.NET code, and I am wondering if I re-build my old code will it use the newly installed GPU drivers?

How can I tell if my ML.NET project uses the GPU?

Charles

CBrauer avatar Jul 07 '21 19:07 CBrauer

I started binary classification model training program. And then I ran the Task Manager to see what the GPU usage was:

Gpu Usage

My GPU usage was only 9%. Why Charles

CBrauer avatar Jul 08 '21 18:07 CBrauer

@JakeRadMSFT Can you take a look here? I am not super familiar with how the model builder gpu stuff translates over to ML.NET.

michaelgsharp avatar Jul 22 '21 21:07 michaelgsharp

@CBrauer This is probably image classification, right? If I understand correctly LightGBM does not support GPU. GPU is only used for getting embeddings from the images, so . At least, I could not see GPU support in any other tasks in Model Builder and it would be strange if LightGBM GPU is supported but it is not enabled in other tasks. My guess it is doing something like this https://docs.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification

So, maybe, GPU is only used for loading the images and the LightGBM is ran to classify them on CPU so you should see also 100% CPU usage. BTW, I think, at least most of the time, GPU training in Tensorflow should show up in "3D" part of the GPU consumption.

On other hand, if LightGBM has GPU binary in Model builder, then I think it would be super-great to see it in all tasks.

torronen avatar Jan 31 '22 11:01 torronen

I don't think that LightGBM has a GPU binary in Model Builder. In fact, I don't think that LightGBM has pre-compiled binaries for anything with a GPU, though you can compile it yourself.

It would be awesome if they did have some pre-compiled GPU binaries so we could use them. It should be possible for an end user to build it themselves and swap it out, though I haven't tested that myself.

michaelgsharp avatar Feb 01 '22 21:02 michaelgsharp

I tried replacing Model builder's lightgbm dll binary compiled for Nvidia GPU in the extension folder. It works but uses CPU. I think it might require the device_type=GPU parameter. Just experimenting.

torronen avatar Feb 09 '22 15:02 torronen

For Microsoft.ML.LightGBM class library adding GPU support went fairly easily. However, the benefits might not make it worth to do it so only very brief notes below. Maybe better to test with LightGBM CLI first if the specific dataset benefits a lot.

  1. Get LightGBM source version 2.3.1 from Github. Compile according to LightGBM documentation. (only issue was some old files from Release and Build folder had to be deleted - check timestamp of lib_lightgbm.dll to make sure it has been rebuilt)
  2. replace lib_lightgbm.dll binary in your app, maybe rename in source of Microsoft.ML.LightGBM and rename the file to be sure the correct version is being used.
  3. Check MaximumBinCountPerFeature / max_bin is low, 15 - 255 works ok
  4. LightGBM binary needs these. I put them in LightGBMTrainerBase.ToDictionary but it is probably not ideal place for long-term. If ModelBuilder team wants to support experimental GPU support and allow users to swap lib_lightgbm.dll then maybe these could be environment variables which users could enable (similar to disabling trainers with env.var's)
res["gpu_use_dp"] = false; // optional, for better speed especially for Nvidia Geforce
res["device_type"] = "gpu"; // could be cuda or cpu also
res["gpu_platform_id"] = 0; // if integrated and dedicated GPU, then this probably should be 1
res["gpu_device_id"] = 0; // use first GPU in gpu_platfrom_id (=vendor) specified above.

It seems LightGBM 2.3.1 only calculates histograms on the GPU so the GPU consumption is very low, only 1-3% in my first test. I was uncertain if it is being even used, but I think so based on the output. There could be some parameters wrong also. image

I got just a feeling it might be faster to train, maybe 30% and CPU consumption is probably little bit lower, but I did not verify this and it is still CPU-bound. Maybe some datasets benefit more. According to LightGBM Github issues they have made improvement in GPU usage since 2.3.1 so it might be best to wait until upgrade of LightGBM library before GPU training.

Anyway, I will keep training with the GPU version from now on. I'll update if I have some new insights.

torronen avatar Feb 09 '22 18:02 torronen