Is it advisable to avoid variable shadowing when using Candle?

Open chenwanqq opened this issue 1 year ago • 1 comments

Discussed in https://github.com/huggingface/candle/discussions/2272

^{Originally posted by chenwanqq June 19, 2024} Considering following facts:

Rust doesn't have a garbage collector.
Variable shadowing does not release memory (drop) within the same scope.
If track_op() is true, applying an operator to a tensor creates a new result tensor that holds a copy of the original tensor(s).

https://github.com/huggingface/candle/blob/2b10aaa05d3752186899bd5b5364d92164edc7ef/candle-core/src/tensor.rs#L675-L682

https://github.com/huggingface/candle/blob/2b10aaa05d3752186899bd5b5364d92164edc7ef/candle-core/src/op.rs#L877-L884

Many current model implementations are 'suspicious' regarding memory usage.

Compare the following two pseudo codes:

Code 1:

let mut x = some_big_tensor;
x = x.some_op(x)?; // The original x will be dropped here.

Code 2:

let x = some_big_tensor;
let x = x.some_op(x)?; // The original x is shadowed, but not dropped.

If my understanding is correct (although I haven't measured the actual memory usage), Code 2 will keep both original x and new x in memory because the original x is not dropped. It would be even worse if x is marked as variable in training scenario. In essence, variable shadowing could potentially lead to a significant increase in memory usage.

Models like LLaMA, Mistral, and LLaVA (which I have implemented) often rely on variable shadowing. If this issue is indeed valid, I believe numerous code segments should be revised, and a warning about this should be issued.

Jun 18 '24 17:06 chenwanqq

Also curious to know the answer and whether there is a good way to measure the effects at runtime given the parallelism involved.

Jul 16 '25 05:07 sempervictus