Louis Sugy

Results 22 comments of Louis Sugy

@bixia1 Thanks for the test logs. I added the missing `@test_util.run_v2_only` flag to the new python test and removed the unused variable.

@bixia1 Don't mind the first push. The second contains a fix for the failing tests.

@bixia1 Can you please send me the test logs?

@bixia1 Can you please send me the logs of the internal CI build?

> 2. TF-TRT almost triples the model size I have investigated why the model size triples and found two moments during conversion where duplication happens. ### Constant folding pass First,...

Quick update on this: adding a `"dependency"` pass before `"constfold"` solves problem 1 and the graph becomes small enough to convert successfully (problem 2 remains).

There is an odd compiler error in C++17 builds in this code in `block_merge_sort.cuh` where I changed the `if` for `CUB_IF_CONSTEXPR`: ```c++ CUB_IF_CONSTEXPR(IS_LAST_TILE) { #pragma unroll for (int item =...

> about 4% speedup for complex data types Is that `DeviceMergeSort` or `DeviceSegmentedSort`? > Is this expected improvement or you had a different workload in mind? If you have a...

I've made most of the requested changes. What remains to be done is adding the unstable benchmark.