Boris Fomitchev comments

Results 35 comments of


                                            Boris Fomitchev

accurately finding correct .so version when multiple versions are installed

To find which libcudnn your particular Torch installation uses, look at cudnn/ffi.lua, it should have a line like : local libnames = {'libcudnn.so.4', 'libcudnn.4.dylib'}

scripts/export.py fails with `--device=cpu`

Yes I can repro it with latest Nemo - though, with changed export script, in different place, during actual export() call. That error usually means that some training-only code gets...

Regression: tensor_parallel.ColumnParallelLinear fails on onnx.export

This was my quick workaround - to replace instances of tensor_parallel.ColumnParallelLinear with my wrapper class below. Something like that should be implemented inside tensor_parallel.ColumnParallelLinear.forward instead: ``` class ColumnLinear(tensor_parallel.ColumnParallelLinear): # redefine...

Is prediction in C++ possible?

TRT will most likely be faster that jitted PyTorch. @ariel415el : did you use FP16 ?

Editing interface

Still not published? UI model can't be run from test.py - different script is required.

SFD speed ?

Thanks! I actually found out a bit more resolution needed for scenes like conference cloud (which I used SFD+FER on). But that already sounds good - I will be using...

Is there any plan to support SequenceConstruct?

@kevinch-nv @rajeevsrao : Basically, most of the loops would require a sequence.

Still there: Incompatibility with bfloat16

Same issue with radius() call. This is for the package built from the trunk, on A6000 box.

Still there: Incompatibility with bfloat16

Any plans to include bfloat16 support on GPU soon ?

Fixing ONNX export for RNN, LSTM runtime unit test

> @borisfom The changes seem to breaks three CI tests. Any idea? The failing tests seem to be those I did not touch - like test_packed_sequence. I can fix them...