Boris Fomitchev
Boris Fomitchev
To find which libcudnn your particular Torch installation uses, look at cudnn/ffi.lua, it should have a line like : local libnames = {'libcudnn.so.4', 'libcudnn.4.dylib'}
Yes I can repro it with latest Nemo - though, with changed export script, in different place, during actual export() call. That error usually means that some training-only code gets...
This was my quick workaround - to replace instances of tensor_parallel.ColumnParallelLinear with my wrapper class below. Something like that should be implemented inside tensor_parallel.ColumnParallelLinear.forward instead: ``` class ColumnLinear(tensor_parallel.ColumnParallelLinear): # redefine...
TRT will most likely be faster that jitted PyTorch. @ariel415el : did you use FP16 ?
Still not published? UI model can't be run from test.py - different script is required.
Thanks! I actually found out a bit more resolution needed for scenes like conference cloud (which I used SFD+FER on). But that already sounds good - I will be using...
@kevinch-nv @rajeevsrao : Basically, most of the loops would require a sequence.
Same issue with radius() call. This is for the package built from the trunk, on A6000 box.
Any plans to include bfloat16 support on GPU soon ?
> @borisfom The changes seem to breaks three CI tests. Any idea? The failing tests seem to be those I did not touch - like test_packed_sequence. I can fix them...