lorax
lorax copied to clipboard
docs: Clarify multi-gpu usage
trafficstars
Using --gpus all for docker run also requires --sharded or --gpus N to be set for LoRAX, but this isn't made clear. We should add something in the docs about GPUs and using multi-GPU.
Also, should add some docs explaining tensor parallelism, and when it makes sense to use multi-GPU. Specifically at least one of:
- Model is too big for one GPU
- GPUs are connected via NVLink
Otherwise the network overhead of GPU-to-GPU communication will be the main bottleneck.