tensor_parallel
tensor_parallel copied to clipboard
Throw errors when trying to wrap models that are not supposed to be wrapped
Some models already rely on devices interactions. I propose we don't wrap them and throw error. Possible examples:
- Wrapping a model that is already wrapped
- Wrapping an 'accelerate' model (_hf_hooks move tensors from device to device during forward pass, etc.)
- Wrapping a data parallel model