candle Add `QTensor::quantize_onto` to remove a redundant dtoh copy?

Add `QTensor::quantize_onto` to remove a redundant dtoh copy?

Open EricLBuehler opened this issue 1 year ago • 0 comments

Currently, QTensor::quantize:

Take a tensor, assume it is on the GPU for this example
Copies the data to the CPU
Quantizes on the CPU
Copies the data back from the CPU to the GPU

In particular, this is an unnecessary copy (2 copies total) if the tensors are already on the CPU. Perhaps a QTensor::quantize_onto function would be better, as it would:

Take a CPU tensor
Quantize on the CPU
Copy the data to the GPU

This means there is only one copy. I have implemented this here: EricLBuehler/candle#12, I would appreciate any thoughts on whether this would be a good addition here.

Jun 29 '24 22:06 EricLBuehler

candle candle copied to clipboard

Add `QTensor::quantize_onto` to remove a redundant dtoh copy?

candle
candle copied to clipboard