candle icon indicating copy to clipboard operation
candle copied to clipboard

Add `QTensor::quantize_onto` to remove a redundant dtoh copy?

Open EricLBuehler opened this issue 1 year ago • 0 comments

Currently, QTensor::quantize:

  • Take a tensor, assume it is on the GPU for this example
  • Copies the data to the CPU
  • Quantizes on the CPU
  • Copies the data back from the CPU to the GPU

In particular, this is an unnecessary copy (2 copies total) if the tensors are already on the CPU. Perhaps a QTensor::quantize_onto function would be better, as it would:

  • Take a CPU tensor
  • Quantize on the CPU
  • Copy the data to the GPU

This means there is only one copy. I have implemented this here: EricLBuehler/candle#12, I would appreciate any thoughts on whether this would be a good addition here.

EricLBuehler avatar Jun 29 '24 22:06 EricLBuehler