candle
candle copied to clipboard
Add `QTensor::quantize_onto` to remove a redundant dtoh copy?
Currently, QTensor::quantize:
- Take a tensor, assume it is on the GPU for this example
- Copies the data to the CPU
- Quantizes on the CPU
- Copies the data back from the CPU to the GPU
In particular, this is an unnecessary copy (2 copies total) if the tensors are already on the CPU. Perhaps a QTensor::quantize_onto function would be better, as it would:
- Take a CPU tensor
- Quantize on the CPU
- Copy the data to the GPU
This means there is only one copy. I have implemented this here: EricLBuehler/candle#12, I would appreciate any thoughts on whether this would be a good addition here.