glake
glake copied to clipboard
Do vtensor need 64K/128K physical memory policy?
vAttention said that: if use 2M pageSize, 128M physical memory can be wasted per-request in the worst-case in Llama-3-8B (TP-1), but if use 64KB, 128M would be only 4M Do vtensor have the same problem? Will vtensor integrate 64K/128K pageSize in the future?