About the number of messages chunked in IBGDA
Why theoretically 3 for maximum?
Assuming the message size (maximum ~KB level) is much smaller than the page size (i.e. NVSHMEM_CUMEM_GRANULARITY, normally very large >100 MB). So the worst case of getting local/remote key is, the message is splitted into two pages at local, two pages at remote, totally 3 pages.
e.g.
| chunk 0 --- | chunk 1 ------ | chunk 2 --------- | | local page i | local page i + 1 ------------------ | | remote page j ------------- | remote page j + 1 |
You can ignore that note as the while loop can proceed more than 3 chunks. But we tried some code simplication and optimizations here for the theretical maximum, but it didn't work.