Xin Yao
Xin Yao
@jberchtold-nvidia Now I use two configs to control the behavior of PDL: - `pdl_sync`: Add `cudaGridDependencySynchronize` to the first kernel, to make sure the previous unknown kernel has flushed results...
> @yaox12 @timmoon10 - This PR is has conflicts. I don't know if it is because the PR needs to fixed or problems with the CI. Would you please say...
Ready for review. The CI failures are irrelevant.
Close this PR because 1. We prefer to use a grouped quantize to further reduce the CPU overhead. 2. There're parallel work on optimizing the quantization kernels, which makes the...
/ok to test ea52007
/ok to test 775f386