refact
refact copied to clipboard
Running on Fedora 38 - Docs Update
When wanting self-hosted we are told to visit https://refact.ai/docs/self-hosting/ and run docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting after ensuring we have docker with nvidia gpu support. Unfortunately these instructions do not work for me while I was able to run the previous release of refact before the significant changes just made. Here is what I was getting when following those instructions:
-- 26 -- WARNING:root:output was:
-- 26 -- - no output -
-- 26 -- WARNING:root:nvidia-smi does not work, that's especially bad for initial setup.
-- 26 -- WARNING:root:Traceback (most recent call last):
-- 26 -- File "/usr/local/lib/python3.8/dist-packages/self_hosting_machinery/scripts/enum_gpus.py", line 17, in query_nvidia_smi
-- 26 -- nvidia_smi_output = subprocess.check_output([
-- 26 -- File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
-- 26 -- return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
-- 26 -- File "/usr/lib/python3.8/subprocess.py", line 516, in run
-- 26 -- raise CalledProcessError(retcode, process.args,
-- 26 -- subprocess.CalledProcessError: Command '['nvidia-smi', '--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu', '--format=csv']' returned non-zero exit status 4.
-- 26 --
I can confirm however that running the enum_gpus.py by importing into python (tested 3.8 - which is in the Dockerfile, and 3.11) the function query_nvidia_smi succeeds. Additionally running the nvidia-smi command and flags from enum_gpus succeed:
(refact) [mrhillsman@workstation refact]$ python --version
Python 3.8.17
(refact) [mrhillsman@workstation refact]$ python
Python 3.8.17 (default, Jun 8 2023, 00:00:00)
[GCC 13.1.1 20230511 (Red Hat 13.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.check_output(["nvidia-smi", "--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu", "--format=csv"])
b'pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu\n00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29\n'
>>> import self_hosting_machinery.scripts.enum_gpus as gpuenum
>>> gpuenum.query_nvidia_smi()
{'gpus': [{'id': '00000000:01:00.0', 'name': 'NVIDIA GeForce RTX 3080', 'mem_used_mb': 11, 'mem_total_mb': 10240, 'temp_celsius': 29}]}
>>> exit()
(refact) [mrhillsman@workstation refact]$ nvidia-smi --query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu --format=csv
pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu
00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29
(refact) [mrhillsman@workstation refact]$ nvidia-smi
Sat Jul 22 15:46:59 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3080 Off | 00000000:01:00.0 Off | N/A |
| 0% 29C P8 13W / 370W | 11MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2727 G /usr/bin/gnome-shell 3MiB |
+---------------------------------------------------------------------------------------+
(refact) [mrhillsman@workstation refact]$ uname -a
Linux workstation 6.3.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 6 04:05:18 UTC 2023 x86_64 GNU/Linux
(refact) [mrhillsman@workstation refact]$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Workstation Edition)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Workstation Edition"
VARIANT_ID=workstation
[mrhillsman@workstation refact-ai]$ sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 33
I would have created a PR for the documentation change but I do not see a repo for the site documentation. Here is what I was able to run and have work which I am recommending be added to the documentation somehow either under Fedora38 specifically or RPM based OSs in general:
podman run -d -it --gpus 0 --security-opt=label=disable -p 8008:8008 -v perm_storage:/perm_storage smallcloud/refact_self_hosting
Thanks for reporting!
We have docs repository
https://github.com/smallcloudai/web_docs_refact_ai
thx @olegklimov will submit a PR soon there apologies for the delay. once i get an open PR/issue there i'll reference here and close this issue.