Dump v
Here is the costs in microseconds of dump_kernel and dump_kernel_v2 on both pinned host or device output on 2^24 capacity table with half of the contents are exported. The table values are stored on pure GPU buckets.
capacity: 2^24, and the table is full when running the export_batch_if num exported: 8388771 dim: 64
A100 + AMD
| dump_kernel | dump_kernel_v2 | dump_kernel_v2_vectorized | |
|---|---|---|---|
| Pinned host memory | 14887.594 | 2116.001 | 607.138 |
| Device | 24.700 | 6.012 | 3.957 |
H20 + Intel
| dump_kernel | dump_kernel_v2 | dump_kernel_v2_vectorized | |
|---|---|---|---|
| Pinned host memory | 624.399 | 44.536 | 44.143 |
| Device | 16.615 | 4.546 | 2.359 |
Hi, @jiashuy This PR is under development yet. I'll fix the problems ASAP.
/blossom-ci
/blossom-ci
/blossom-ci
The bug in 3f04c9eebdd38e2d364d69e8362d9e92c00bf7ca was found and reviewed by @wodesuck
Hi @Lifann , is this PR ready for being merged?
Hi @Lifann , is this PR ready for being merged? Yes. I think it was well tested and verified by real world application.
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci