gpu-utils icon indicating copy to clipboard operation
gpu-utils copied to clipboard

PermissionError: [Errno 1] Operation not permitted

Open kcsf opened this issue 1 year ago • 9 comments

(rickslab-gpu-utils-env) cg@gpu-49-59:~$ sudo gpu-ls Error: Invalid icon path Detected GPUs: AMD: 1 AMD: amdgpu version: 1:6.0.60002-1718217.22.04 AMD: Wattman features enabled: 0xfffd7fff Warning: Can not read parameter: loading, disabling for this GPU: 0 Warning: Can not read parameter: mem_loading, disabling for this GPU: 0 Warning: Can not read parameter: sclk_ps, disabling for this GPU: 0 Warning: Can not read parameter: mclk_ps, disabling for this GPU: 0 Warning: Can not read parameter: ppm, disabling for this GPU: 0 Warning: Can not read parameter: power_dpm_force, disabling for this GPU: 0 Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0 Warning: Can not read parameter: power, disabling for this GPU: 0 Warning: Can not read parameter: power_cap, disabling for this GPU: 0 Warning: Can not read parameter: temperatures, disabling for this GPU: 0 Warning: Can not read parameter: voltages, disabling for this GPU: 0 Warning: Can not read parameter: frequencies, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_enable, disabling for this GPU: 0 Warning: Can not read parameter: fan_target, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed, disabling for this GPU: 0 Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0 1 total GPUs, 1 rw, 0 r-only, 0 w-only

Traceback (most recent call last): File "/usr/bin/gpu-ls", line 154, in main() File "/usr/bin/gpu-ls", line 138, in main gpu_list.read_gpu_pstates() File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2136, in read_gpu_pstates gpu.read_gpu_pstates() File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1061, in read_gpu_pstates for line in card_file: PermissionError: [Errno 1] Operation not permitted

kcsf avatar May 24 '24 15:05 kcsf

Which distro are you using? The driver files are normally world readable.

Ricks-Lab avatar May 24 '24 16:05 Ricks-Lab

Also, I do not recommend running with sudo. The execution of files that write to driver files needs root permissions. gpu-pac is the only utility that writes to these files. By default it creates bash files that you can execute yourself with sudo or if you use the --execute_pac option, it will execute the bash script with sudo which will prompt you for credentials at the command line.

It would also be helpful to execute with --debug option and post the log file contents here. Feel free to delete any details from the logfile that you do not want to make public.

Ricks-Lab avatar May 24 '24 16:05 Ricks-Lab

Hi Rick!
Thank you so much for your prompt response. I got busy and neglected to follow up. Now of course, it's rather urgent that I knock the power usage down on these gpus from 100 watts to 80.

Here's some info:

cg@gpu-13-23:~$ pip list | grep rickslab-gpu-utils rickslab-gpu-utils 3.6.0 cg@gpu-13-23:~$ pip3 list | grep rickslab-gpu-utils rickslab-gpu-utils 3.6.0 cg@gpu-13-23:~$ dpkg -l | grep gpu-utils ii rickslab-gpu-utils 3.6.0-2 all AMD GPU performance adjustment and monitoring cg@gpu-13-23:~$ gpu-ls --debug Error: Invalid icon path Ubuntu: Validated Traceback (most recent call last): File "/usr/bin/gpu-ls", line 154, in main() File "/usr/bin/gpu-ls", line 102, in main gpu_list.set_gpu_list(clinfo_flag=True) File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1885, in set_gpu_list pp_od_file_details = file_ptr.read() PermissionError: [Errno 1] Operation not permitted cg@gpu-13-23:~$ gpu-ls Error: Invalid icon path Detected GPUs: AMD: 1 AMD: amdgpu version: 1:6.0.60002-1718217.22.04 AMD: Wattman features enabled: 0xfffd7fff Warning: Can not read parameter: loading, disabling for this GPU: 0 Warning: Can not read parameter: mem_loading, disabling for this GPU: 0 Warning: Can not read parameter: sclk_ps, disabling for this GPU: 0 Warning: Can not read parameter: mclk_ps, disabling for this GPU: 0 Warning: Can not read parameter: ppm, disabling for this GPU: 0 Warning: Can not read parameter: power_dpm_force, disabling for this GPU: 0 Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0 Warning: Can not read parameter: power, disabling for this GPU: 0 Warning: Can not read parameter: power_cap, disabling for this GPU: 0 Warning: Can not read parameter: temperatures, disabling for this GPU: 0 Warning: Can not read parameter: voltages, disabling for this GPU: 0 Warning: Can not read parameter: frequencies, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_enable, disabling for this GPU: 0 Warning: Can not read parameter: fan_target, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed, disabling for this GPU: 0 Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0 1 total GPUs, 1 rw, 0 r-only, 0 w-only

Traceback (most recent call last): File "/usr/bin/gpu-ls", line 154, in main() File "/usr/bin/gpu-ls", line 138, in main gpu_list.read_gpu_pstates() File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2136, in read_gpu_pstates gpu.read_gpu_pstates() File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1061, in read_gpu_pstates for line in card_file: PermissionError: [Errno 1] Operation not permitted cg@gpu-13-23:~$ sudo gpu-ls --debug Error: Invalid icon path Ubuntu: Validated Traceback (most recent call last): File "/usr/bin/gpu-ls", line 154, in main() File "/usr/bin/gpu-ls", line 102, in main gpu_list.set_gpu_list(clinfo_flag=True) File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1885, in set_gpu_list pp_od_file_details = file_ptr.read() PermissionError: [Errno 1] Operation not permitted

kcsf avatar Jun 10 '24 12:06 kcsf

When the --debug option is used, there should be a log file that is produced. Can you paste it contents here?

Also, can you upgrade to the latest version? I recently released 3.9.0 to PyPI.

Ricks-Lab avatar Jun 10 '24 12:06 Ricks-Lab

Ok, I upgraded to 3.9

now i'm getting this: `cg@gpu-24-34:~$ gpu-ls --debug Ubuntu: Validated HW Exception by GPU node-1 (Agent handle: 0x5e41c0b8f730) reason :GPU Hang Error: system support issue for 01:00.0: [[Errno 1] Operation not permitted] Detected GPUs: AMD: 1 AMD: amdgpu version: 1:6.0.60002-1718217.22.04 AMD: Wattman features enabled: 0xfffd7fff Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0] Error: System support issue for GPU [01:00.0]

read_time_val: 10-Jun-2024 13:59:15

model_display: True: Cyan Skillfish loading: True: None mem_loading: True: None mem_vram_usage: True: 0.06260871887207031 mem_gtt_usage: True: 0.2832306048274743 power: True: None power_cap: True: None energy: True: 0.0 temp_val: True: None vddgfx_val: True: nan fan_pwm: True: None sclk_f_val: True: None sclk_ps_val: True: mclk_f_val: True: None mclk_ps_val: True: ppm: True:

Total of 1 GPU: 0 are rw, 1 is r-only, and 0 are w-only

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False Device ID: {'device': '0x13fe', 'subsystem_device': '0x0000', 'subsystem_vendor': '0x1022', 'vendor': '0x1002'} Decoded Device ID: Cyan Skillfish Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cyan Skillfish Display Card Model: Cyan Skillfish PCIe ID: 01:00.0 Link Speed: 16.0 GT/s PCIe Link Width: 16 ################################################## Driver: amdgpu vBIOS Version: 113-AMDRBN-003 Compute Platform: None GPU Type: Modern HWmon: /sys/class/drm/card0/device/hwmon/hwmon0 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:01:00.0 ################################################## ################################################## Current GTT Memory Usage (%): 0.283 Current GTT Memory Used (GB): 0.011 Total GTT Memory (GB): 3.738 Current VRAM Usage (%): 0.063 Current VRAM Used (GB): 0.005 Total VRAM (GB): 8.000 Critical Temps (C): {} Vddgfx Offset (mV): 0 Vddgfx Offset Range (mV): [-25, 25] ################################################## Disabled Parameters: pp_od_clk_voltage, sclk_f_range, mclk_f_range, vddc_range, pp_features, unique_id, loading, mem_loading, sclk_ps, mclk_ps, pstates, ppm, power_dpm_force, power_dpm_state, power_cap_range, power, power_cap, temperatures, voltages, frequencies, fan_speed_range, fan_pwm_range, fan_enable, fan_target, fan_speed, pwm_mode, fan_pwm

`

kcsf avatar Jun 10 '24 14:06 kcsf

gpu-utils_debug-log.txt

Am I able to control the gpu speed and/or power use yet, or is there more troubleshooting to do?

kcsf avatar Jun 10 '24 14:06 kcsf

I am running Ubuntu 22.04 on two systems and do not see the issue of driver files not being readable. Possibly a driver/hardware issue or feature definition may be different for newer GPUs. I suggest updating: AMD: Wattman features enabled: 0xfffd7fff to 0xffffffff

Here is what ChatGPT has to say:

The error message you're encountering indicates a hardware exception caused by a GPU hang. This can be due to several factors, including hardware failures, driver issues, or system configuration problems. Here's a step-by-step guide to troubleshoot and address this issue:

Check System Logs:

Look into system logs for more detailed error messages. On Linux, you can use dmesg or check /var/log/syslog or /var/log/messages. Update GPU Drivers:

Ensure that your GPU drivers are up to date. You can download the latest drivers from the GPU manufacturer's website (NVIDIA, AMD, etc.). Check Hardware:

Ensure that the GPU is properly seated in its slot and that all power connectors are securely attached. Monitor the GPU temperature to ensure it is not overheating. You can use tools like nvidia-smi for NVIDIA GPUs or radeontop for AMD GPUs. Test GPU on Another System:

If possible, test the GPU on a different system to rule out hardware failure. Verify System Configuration:

Ensure that your system’s power supply is adequate for the GPU. Check for BIOS/UEFI updates for your motherboard and apply them if necessary. Disable any overclocking settings and see if the problem persists. Check Permissions:

The error message "Operation not permitted" suggests there might be a permissions issue. Make sure you have the necessary permissions to access the GPU. Running the operation as root or with sudo might help. Consult Documentation:

Refer to the documentation for your specific GPU and system for any known issues or configuration tips. Contact Support:

If the problem persists, consider reaching out to the GPU manufacturer’s support or your system’s support service for further assistance. By systematically going through these steps, you should be able to identify and resolve the issue causing the GPU hang.

Ricks-Lab avatar Jun 10 '24 23:06 Ricks-Lab

Dang. I've tried most of that. It's a BC-250 (re-purposed PS5 card). There are no bios updates for it. The only thing I can think to try is update the kernel & os to 24.04 - but it took me a long time to find an old kernel that worked in the first place.

Any ideas or suggestions would be much appreciated.

kcsf avatar Jun 12 '24 01:06 kcsf

I really doubt that any of this would be enabled for PS5 hardware.

Ricks-Lab avatar Jun 13 '24 00:06 Ricks-Lab