gpu-manager icon indicating copy to clipboard operation
gpu-manager copied to clipboard

Does it support NVIDIA TensorRT

Open ivoxx opened this issue 3 years ago • 19 comments

Does it support NVIDIA TensorRT?

  1. inside container cannot read tensorrt engine which generate from same deivce and cuda version ,shows

[TensorRT] WARNING: using an engine plan file across different models of devices is not recommended and is likely to affect performance or enven cause errors

  1. inside container some onnx models cannot build tensorrt engine, report segmentation fault

any solution or suggestion?

ivoxx avatar Aug 05 '21 07:08 ivoxx

I had the same problem. It is normal to use the whole card, but not once it is fragmented.

xwttzz avatar Aug 05 '21 07:08 xwttzz

The gpu-manager version and log?

mYmNeo avatar Aug 05 '21 07:08 mYmNeo

image 日志看起来是正常的。 manager的版本是1.1.4 @mYmNeo 方便加个微信号?

xwttzz avatar Aug 05 '21 07:08 xwttzz

Please follow the FAQ, provides the application container log

mYmNeo avatar Aug 06 '21 01:08 mYmNeo

2

1

ivoxx avatar Aug 06 '21 01:08 ivoxx

image this is the container log. @mYmNeo

xwttzz avatar Aug 06 '21 02:08 xwttzz

You need following the FAQ, set environment first, then run your application, the expected log should have /tmp/xxx pattern

mYmNeo avatar Aug 09 '21 07:08 mYmNeo

I have followed the FAQ。 image I can't find any useful logs in this directory。Which log in this directory do you want。 @mYmNeo

xwttzz avatar Aug 10 '21 03:08 xwttzz

I have followed the FAQ。 image I can't find any useful logs in this directory。Which log in this directory do you want。 @mYmNeo

  1. Export LOGGER_LEVEL environment
  2. run your application
  3. vcuda log will print on the screen

mYmNeo avatar Aug 10 '21 03:08 mYmNeo

https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145 The above link shows all logs after VCUDA is enabled。 @mYmNeo

xwttzz avatar Aug 10 '21 06:08 xwttzz

https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145 The above link shows all logs after VCUDA is enabled。 @mYmNeo

The error shows you application missing library ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory, this is not VCUDA library

mYmNeo avatar Aug 11 '21 01:08 mYmNeo

The confusing part is that when I set vcore to 100 it is ok, only when it is fragmented 。 Besides, it has been confirmed that LibnVinfer has been installed in the container. @mYmNeo

xwttzz avatar Aug 11 '21 05:08 xwttzz

How about re-pull the image thomassong/gpu-manager:1.1.5. We have fixed a recursive problem about vcuda.

mYmNeo avatar Aug 11 '21 07:08 mYmNeo

I updated the latest log in the original address. [https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145] Libnvinfer was a historical log and it has nothing to do with this。 @mYmNeo

xwttzz avatar Aug 11 '21 07:08 xwttzz

https://gist.github.com/xwttzz/1f4b3794a2fb19f430ebea828030d145

I think you should debug the coredump to find where it crash

mYmNeo avatar Aug 12 '21 01:08 mYmNeo

@xwttzz have u solved the problem ?

austingg avatar May 05 '22 11:05 austingg

I have the same problem, is anyone have a solution?

so2bin avatar Sep 15 '22 01:09 so2bin

I have the same problem, is anyone have a solution?

Hi,have u solved the problem ?

alex337 avatar Dec 01 '22 12:12 alex337

https://github.com/tkestack/vcuda-controller/issues/38#issue-1906341966 具体报错

vicmeng avatar Oct 09 '23 05:10 vicmeng