gpu-manager
gpu-manager copied to clipboard
Does it support NVIDIA TensorRT
Does it support NVIDIA TensorRT?
- inside container cannot read tensorrt engine which generate from same deivce and cuda version ,shows
[TensorRT] WARNING: using an engine plan file across different models of devices is not recommended and is likely to affect performance or enven cause errors
- inside container some onnx models cannot build tensorrt engine, report segmentation fault
any solution or suggestion?
I had the same problem. It is normal to use the whole card, but not once it is fragmented.
The gpu-manager version and log?
日志看起来是正常的。 manager的版本是1.1.4 @mYmNeo 方便加个微信号?
Please follow the FAQ, provides the application container log
this is the container log. @mYmNeo
You need following the FAQ, set environment first, then run your application, the expected log should have /tmp/xxx
pattern
I have followed the FAQ。
I can't find any useful logs in this directory。Which log in this directory do you want。
@mYmNeo
I have followed the FAQ。
I can't find any useful logs in this directory。Which log in this directory do you want。 @mYmNeo
- Export LOGGER_LEVEL environment
- run your application
- vcuda log will print on the screen
https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145 The above link shows all logs after VCUDA is enabled。 @mYmNeo
https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145 The above link shows all logs after VCUDA is enabled。 @mYmNeo
The error shows you application missing library ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory
, this is not VCUDA library
The confusing part is that when I set vcore to 100 it is ok, only when it is fragmented 。 Besides, it has been confirmed that LibnVinfer has been installed in the container. @mYmNeo
How about re-pull the image thomassong/gpu-manager:1.1.5. We have fixed a recursive problem about vcuda.
I updated the latest log in the original address. [https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145] Libnvinfer was a historical log and it has nothing to do with this。 @mYmNeo
https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145
I think you should debug the coredump to find where it crash
@xwttzz have u solved the problem ?
I have the same problem, is anyone have a solution?
I have the same problem, is anyone have a solution?
Hi,have u solved the problem ?
https://github.com/tkestack/vcuda-controller/issues/38#issue-1906341966 具体报错