Bend
Bend copied to clipboard
Request: Multi GPU parallelism
Does this natively support parallelism accross gpus? Also feature request: natively perform flash attention please
It doesn't support working across multi gpus yet. First we would require a 64bit implementation of HVM to be able to use all resources and then a significant change to the cuda runtime to be able to use multiple GPUS. An interesting idea to pursue in the not-so-far future, but not our immediate priority.
Also does this support parallelism across the whole system (eg across CPU + multiple GPUs)?