dashdot
dashdot copied to clipboard
[Feature] Add a custom docker image with GPU support
Description of the feature
Hello Is there any chance that you could add the GPU widget to the docker image on hub.docker.com with custom tag? Since you said : "Docker images do not include the necessary tools (mainly, because I don't want to bloat the image for everyone)." You can add it with a custom tag to the hub.docker.com so anyone looking to have the GPU widget with docker image they'll just pull the image with GPU tag (for example) and others who doesn't need it just pull the image with the latest tag. This way everyone will be happy :)
Additional context
No response
Hey there, thanks for creating the issue. Unfortunately, there are a few other problems I have with the GPU support:
- Different GPUs need different tools and I don't think I can install all in one container
- You would have to do a lot of manual work anyway, since you need to install tools on the host as well, to pass the GPU into a container
- It would make the build (CI) a lot more complicated and probably make the already long times per run even longer
- I created Dashdot mainly for my own purposes and I have no need for the GPU module, so I have no real interest in implementing it - and I suspect that it would be a lot of work
If you want to use the GPU widget, I suggest running from source. If that does not work, and you want to get it running in Docker, it would be really cool if you could report back all the steps you took, so we can put it in the docs :)
I will keep this open as a feature request. In case anyone wants to spend some time working on it, feel free to PM me on Discord!
I thought it's something ready and already made that you have in your drawer :D Sure I will try to tinker with it on some weekend and see what we see.
I'd also like to use it for a nvidia gpu. Already tried some stuff, but it will always break the container, when I start it...
@MauriceNino Somehow this is working for me, but got an empty graph https://i.ibb.co/zNqXYYG/dash2.png
FROM nvidia/cuda:12.1.0-base-ubuntu22.04
RUN apt update && apt install -y git curl pciutils dmidecode && \
curl -fsSL https://deb.nodesource.com/setup_19.x | bash - && apt-get install -y nodejs && \
npm install --global yarn && \
git clone https://github.com/MauriceNino/dashdot && \
cd dashdot && \
yarn && \
yarn build:prod && \
rm -rf /var/lib/apt/lists/*
WORKDIR dashdot/
CMD ["yarn", "start"]
docker run .. --runtime=nvidia .. dash
systeminformation output:
cat << EOF > main.js
const si = require('systeminformation');
si.graphics()
.then(data => console.log(data))
.catch(error => console.error(error));
EOF
cat << EOF > Dockerfile
FROM nvidia/cuda:12.1.0-base-ubuntu22.04
COPY main.js /app/main.js
WORKDIR /app
RUN apt update && apt install -y curl pciutils dmidecode && \
curl -fsSL https://deb.nodesource.com/setup_19.x | bash - && apt-get install -y nodejs && \
npm install systeminformation
CMD node main.js
EOF
docker build -t graph-print . && docker run --rm -it --runtime=nvidia graph-print
{
controllers: [
{
vendor: '',
model: '',
bus: '',
busAddress: '00:01.0',
vram: 4,
vramDynamic: false,
pciID: ''
},
{
vendor: 'NVIDIA Corporation',
model: 'GP106GL [Quadro P2200] ',
bus: 'Onboard',
busAddress: '01:00.0',
vram: 5120,
vramDynamic: false,
driverVersion: '515.86.01',
subDeviceId: '0x131B10DE',
name: 'Quadro P2200',
pciBus: '00000000:01:00.0',
fanSpeed: 69,
memoryTotal: 5120,
memoryFree: 5059,
temperatureGpu: 37,
powerDraw: 21.96,
powerLimit: 75,
clockCore: 1012,
clockMemory: 5005
}
],
displays: []
}
- 4MB device is probably BMC VGA
- Some ENV variable to hide this kind of device would be useful ( BUS_ADDRESS_FILTER ? )
- producing dashdot docker image with nvidia name suffix would be useful too, its worth a shot ( of course it can slow down the CI, but this might be a parallel step and no one would be mad if this would be "slower" than building main image, btw base image is ~30Mb, its not such a big deal I would say )
Hi @lukasmrtvy, thanks for trying it out and reporting back.
Why are you using --runtime=nvidia? The guides I checked, used --gpus all to pass the Nvidia GPU to docker.
Also, the problem with your setup is, that the GPU entry in your SI test would need the properties utilizationGpu and utilizationMemory to report back usage (both are missing for you).
I have tried your test container, by installing the GPU support according to [this guide]https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/) and got the following output:
[
{
driverVersion: '526.47',
subDeviceId: '0x87C11043',
name: 'NVIDIA GeForce RTX 3070',
pciBus: '00000000:01:00.0',
fanSpeed: 0,
memoryTotal: 8192,
memoryUsed: 1026,
memoryFree: 7005,
utilizationGpu: 7,
utilizationMemory: 13,
temperatureGpu: 48,
temperatureMemory: undefined,
powerDraw: 23.31,
powerLimit: 240,
clockCore: 210,
clockMemory: 405
}
]
As you can see, the needed props exist there. Unfortunately, I can't get it running due to different problems (running Linux in WSL2), so I can't really work on the docker integration, as I have no test bench.
As to your questions:
- A device filter is a good idea, I will see that I implement that in the next few days (if you could open an issue for that, so I can't forget it, that would be great)
- Creating a docker image is a problem, because CI times out at 1hr (and honestly, runs that long should be forbidden) and the ARM builds take forever and would then take even longer. If I could run the build in GitHub CI, I would not care so much, but my tests with it showed runs longer than 1hr, making it unable to complete in time. So I am currently running the builds on my private hardware in Drone CI. This seems to be faster (~20m per run currently), but ARM builds are still notably slower.
Here's what I've tried so far to get this working on Unraid (unsuccessfully).
On Unraid, the general procedure is to install the Nvidia-Driver plugin, which in turn installs necessary drivers and the Nvidia Container Toolkit
Once the container toolkit is installed, you can make your Nvidia GPU available to docker containers by adding the --runtime nvidia to your docker run command (seen here in a previous comment).
Additionally, you need to add the container toolkit environment variables NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES
Using this setup, most docker containers can access the nvidia-smi utility out of the box and access installed cards.
For example, running nvidia-smi in a completely empty docker container based on the ubuntu:latest image as seen below works just fine.
docker run
....
-e 'NVIDIA_VISIBLE_DEVICES'='GPU-{mygpuid}'
-e 'NVIDIA_DRIVER_CAPABILITIES'='all'
--runtime=nvidia
nvidia-smi
However when using the same docker run arguments (runtime and variables) with the mauricenino/dashdot image, the nvidia-smi command does not exist within the docker container. If trying to run that command with your image I get:
sh: nvidia-smi: not found
I don't really know enough about Docker to know what to troubleshoot next. I have been wondering if there is something fundamentally different about the base image you're using which prevents nvidia runtime from working?
I was able to get it working by creating my own image using the dockerfile shared by @lukasmrtvy. I do not have the same issue with missing statistics, it appears to be working as expected.
@HoreaM managed to get it working too with the following config.
docker-compose:
dashdot-gpu:
image: dashdot-gpu:latest
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- capabilities:
- gpu
privileged: true
ports:
- 7678:3001
environment:
DASHDOT_ENABLE_CPU_TEMPS: 'true'
DASHDOT_ALWAYS_SHOW_PERCENTAGES: 'true'
DASHDOT_SPEED_TEST_INTERVAL: '1440'
DASHDOT_ENABLE_STORAGE_SPLIT_VIEW: 'true'
DASHDOT_WIDGET_LIST: 'os,storage,network,cpu,ram,gpu'
DASHDOT_FS_DEVICE_FILTER: 'sdd'
DASHDOT_OVERRIDE_NETWORK_SPEED_UP: '500000000'
DASHDOT_OVERRIDE_NETWORK_SPEED_DOWN: '500000000'
volumes:
- /:/mnt/host:ro
Dockerfile (its a modified version of the official one, just running cuda/ubuntu as a base instead of alpine):
# BASE #
FROM nvidia/cuda:12.2.0-base-ubuntu20.04 AS base
WORKDIR /app
ARG TARGETPLATFORM
ENV DASHDOT_RUNNING_IN_DOCKER=true
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES="compute,video,utility"
ENV TZ=Europe/Bucharest
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN \
/bin/echo ">> installing dependencies" &&\
apt-get update &&\
apt-get install -y \
wget \
mdadm \
dmidecode \
util-linux \
pciutils \
curl \
lm-sensors \
speedtest-cli &&\
if [ "$TARGETPLATFORM" = "linux/amd64" ] || [ "$(uname -m)" = "x86_64" ]; \
then \
/bin/echo ">> installing dependencies (amd64)" &&\
wget -qO- https://install.speedtest.net/app/cli/ookla-speedtest-1.1.1-linux-x86_64.tgz \
| tar xmoz -C /usr/bin speedtest; \
elif [ "$TARGETPLATFORM" = "linux/arm64" ] || [ "$(uname -m)" = "aarch64" ]; \
then \
/bin/echo ">> installing dependencies (arm64)" &&\
wget -qO- https://install.speedtest.net/app/cli/ookla-speedtest-1.1.1-linux-aarch64.tgz \
| tar xmoz -C /usr/bin speedtest &&\
apk --no-cache add raspberrypi; \
elif [ "$TARGETPLATFORM" = "linux/arm/v7" ]; \
then \
/bin/echo ">> installing dependencies (arm/v7)" &&\
wget -qO- https://install.speedtest.net/app/cli/ookla-speedtest-1.1.1-linux-armhf.tgz \
| tar xmoz -C /usr/bin speedtest &&\
apk --no-cache add raspberrypi; \
else /bin/echo "Unsupported platform"; exit 1; \
fi
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -
RUN echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list
RUN curl -sL https://deb.nodesource.com/setup_19.x | bash -
RUN \
/bin/echo ">>installing yarn" &&\
apt-get update &&\
apt-get install -y \
yarn
# DEV #
FROM base AS dev
EXPOSE 3001
EXPOSE 3000
RUN \
/bin/echo -e ">> installing dependencies (dev)" &&\
apt-get install -y \
git &&\
git config --global --add safe.directory /app
# BUILD #
FROM base as build
ARG BUILDHASH
ARG VERSION
RUN \
/bin/echo -e ">> installing dependencies (build)" &&\
apt-get install -y \
git \
make \
clang \
build-essential &&\
git config --global --add safe.directory /app &&\
/bin/echo -e "{\"version\":\"$VERSION\",\"buildhash\":\"$BUILDHASH\"}" > /app/version.json
RUN \
/bin/echo -e ">> clean-up" &&\
apt-get clean && \
rm -rf \
/tmp/* \
/var/tmp/*
COPY . ./
RUN \
yarn --immutable --immutable-cache &&\
yarn build:prod
# PROD #
FROM base as prod
EXPOSE 3001
COPY --from=build /app/package.json .
COPY --from=build /app/version.json .
COPY --from=build /app/.yarn/releases/ .yarn/releases/
COPY --from=build /app/dist/apps/api dist/apps/api
COPY --from=build /app/dist/apps/cli dist/apps/cli
COPY --from=build /app/dist/apps/view dist/apps/view
CMD ["yarn", "start"]
This requires building your own image right now, but I would like to add it to the main repo down the line. Just didn't get time to implement it yet.
Can you confirm that one working for you as well @caesay?
I dropped your Dockerfile into Portainer on my server to build an image and got the following build failure:
...
...
Step 22/33 : RUN /bin/echo -e ">> clean-up" && apt-get clean && rm -rf /tmp/* /var/tmp/*
---> Running in ae7986553966
>> clean-up
Removing intermediate container ae7986553966
---> c68bc9e2835f
Step 23/33 : COPY . ./
---> ca116c40c3d2
Step 24/33 : RUN yarn --immutable --immutable-cache && yarn build:prod
---> Running in aa462cd0b41c
yarn install v1.22.19
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
success Saved lockfile.
Done in 0.03s.
yarn run v1.22.19
error Couldn't find a package.json file in "/app"
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
The command '/bin/sh -c yarn --immutable --immutable-cache && yarn build:prod' returned a non-zero code: 1
I dropped your Dockerfile into Portainer on my server to build an image and got the following build failure:
... ... Step 22/33 : RUN /bin/echo -e ">> clean-up" && apt-get clean && rm -rf /tmp/* /var/tmp/* ---> Running in ae7986553966 >> clean-up Removing intermediate container ae7986553966 ---> c68bc9e2835f Step 23/33 : COPY . ./ ---> ca116c40c3d2 Step 24/33 : RUN yarn --immutable --immutable-cache && yarn build:prod ---> Running in aa462cd0b41c yarn install v1.22.19 info No lockfile found. [1/4] Resolving packages... [2/4] Fetching packages... [3/4] Linking dependencies... [4/4] Building fresh packages... success Saved lockfile. Done in 0.03s. yarn run v1.22.19 error Couldn't find a package.json file in "/app" info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. The command '/bin/sh -c yarn --immutable --immutable-cache && yarn build:prod' returned a non-zero code: 1
Sorry, I'm not really an expert in this, but the only thing that comes to mind is you probably forgot to do a git clone of this project before. The way I did it was: I first did a git clone https://github.com/MauriceNino/dashdot.git and then replaced the Dockerfile in the project with the one above.
Hi, any ides on how to do this for intel igpu with i915 driver? Usually for plex it's enough to do
devices:
- /dev/dri:/dev/dri
Subscribing to this... for whenever this becomes mainstream, maybe a docker image tag? like dashdot:nvidia-gpu?
Not sure I wanna build this manually all the time since I use watchtower, so just crossing my fingers here.
Subscribing to this... for whenever this becomes mainstream, maybe a
docker image tag? like dashdot:nvidia-gpu?Not sure I wanna build this manually all the time since I use watchtower, so just crossing my fingers here.
+1
So nothing for Intel GPUs? Or it will work with nvidia image?
@PilaScat No nothing for Intel GPUs right now. If you know how to make it work, please create a PR for it. I have no machine with an Intel GPU for testing, so I can't implement it, unfortunately. Same goes for AMD.
I ran some builds with docker locally.
Docker args: --device=/dev/dri:/dev/dri
I also added these to the alpine docker file's RUN apk update:
linux-firmware-i915 \
mesa-dri-gallium
I don't get memory or load data populated, though.
I did a bit of research, and NVTOP (non-alpine multi-brand gpu monitoring software) mentioned that Intel is working on exposing more hardware information through HWMON, but that was about two years ago that their readme updated to say that. NVTOP
Intel patchwork for what it's worth
I ran a bunch of different commands on the system files visible from alpine. Not really much of value that I could see for gpu stats. I tried stressing the iGPU while repeatedly checking those files to see if the current frequency values would change, but they wouldn't.
Here's the output for what that's worth:
/ # lspci -v
00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c) (prog-if 00 [VGA controller])
DeviceName: Onboard - Video
Subsystem: Gigabyte Technology Co., Ltd Device d000
Flags: bus master, fast devsel, latency 0, IRQ 149, IOMMU group 0
Memory at 41000000 (64-bit, non-prefetchable) [size=16M]
Memory at 50000000 (64-bit, prefetchable) [size=256M]
I/O ports at 3000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
Kernel driver in use: i915
/ # lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c)
/ # cat /sys/class/drm/card0/device/uevent
DRIVER=i915
PCI_CLASS=30000
PCI_ID=8086:4680
PCI_SUBSYS_ID=1458:D000
PCI_SLOT_NAME=0000:00:02.0
MODALIAS=pci:v00008086d00004680sv00001458sd0000D000bc03sc00i00
/ # ls /sys/class/drm/card0/
card0-DP-1 dev error gt_RP1_freq_mhz gt_boost_freq_mhz gt_min_freq_mhz subsystem
card0-HDMI-A-1 device gt gt_RPn_freq_mhz gt_cur_freq_mhz metrics uevent
card0-HDMI-A-2 engine gt_RP0_freq_mhz gt_act_freq_mhz gt_max_freq_mhz power
/ # ls /sys/class/drm/card0/metrics
/ # cat /sys/class/drm/card0/gt_cur_freq_mhz
1400
/ # cat /sys/class/drm/card0/gt_max_freq_mhz
1450
/ # ls /sys/class/drm/
card0 card0-DP-1 card0-HDMI-A-1 card0-HDMI-A-2 renderD128 version
/ # cat /sys/class/drm/card0/gt_boost_freq_mhz
1450
/ # cat /sys/class/drm/card0/gt_min_freq_mhz
300
/ # cat /sys/class/drm/card0/gt_RP0_freq_mhz
1450
/ # cat /sys/class/drm/card0/gt_RP1_freq_mhz
700
/ # cat /sys/class/drm/card0/gt_RP1_freq_mhz
700
/ # ls /sys/class/drm/renderD128/
dev device power subsystem uevent
/ # cat /sys/class/drm/renderD128/uevent
MAJOR=226
MINOR=128
DEVNAME=dri/renderD128
DEVTYPE=drm_minor
/ # cat /sys/class/drm/renderD128/device/uevent
DRIVER=i915
PCI_CLASS=30000
PCI_ID=8086:4680
PCI_SUBSYS_ID=1458:D000
PCI_SLOT_NAME=0000:00:02.0
MODALIAS=pci:v00008086d00004680sv00001458sd0000D000bc03sc00i00