Louis Sugy
Louis Sugy
@bixia1 Thanks for the test logs. I added the missing `@test_util.run_v2_only` flag to the new python test and removed the unused variable.
@bixia1 Don't mind the first push. The second contains a fix for the failing tests.
@bixia1 Can you please send me the test logs?
@bixia1 Can you please send me the logs of the internal CI build?
> 2. TF-TRT almost triples the model size I have investigated why the model size triples and found two moments during conversion where duplication happens. ### Constant folding pass First,...
Quick update on this: adding a `"dependency"` pass before `"constfold"` solves problem 1 and the graph becomes small enough to convert successfully (problem 2 remains).
cc @gevtushenko @elstehle @alliepiper
There is an odd compiler error in C++17 builds in this code in `block_merge_sort.cuh` where I changed the `if` for `CUB_IF_CONSTEXPR`: ```c++ CUB_IF_CONSTEXPR(IS_LAST_TILE) { #pragma unroll for (int item =...
> about 4% speedup for complex data types Is that `DeviceMergeSort` or `DeviceSegmentedSort`? > Is this expected improvement or you had a different workload in mind? If you have a...
I've made most of the requested changes. What remains to be done is adding the unstable benchmark.