Is it advisable to avoid variable shadowing when using Candle?
Discussed in https://github.com/huggingface/candle/discussions/2272
Originally posted by chenwanqq June 19, 2024 Considering following facts:
- Rust doesn't have a garbage collector.
- Variable shadowing does not release memory (drop) within the same scope.
- If track_op() is true, applying an operator to a tensor creates a new result tensor that holds a copy of the original tensor(s).
https://github.com/huggingface/candle/blob/2b10aaa05d3752186899bd5b5364d92164edc7ef/candle-core/src/tensor.rs#L675-L682
https://github.com/huggingface/candle/blob/2b10aaa05d3752186899bd5b5364d92164edc7ef/candle-core/src/op.rs#L877-L884
Many current model implementations are 'suspicious' regarding memory usage.
Compare the following two pseudo codes:
Code 1:
let mut x = some_big_tensor;
x = x.some_op(x)?; // The original x will be dropped here.
Code 2:
let x = some_big_tensor;
let x = x.some_op(x)?; // The original x is shadowed, but not dropped.
If my understanding is correct (although I haven't measured the actual memory usage), Code 2 will keep both original x and new x in memory because the original x is not dropped. It would be even worse if x is marked as variable in training scenario. In essence, variable shadowing could potentially lead to a significant increase in memory usage.
Models like LLaMA, Mistral, and LLaVA (which I have implemented) often rely on variable shadowing. If this issue is indeed valid, I believe numerous code segments should be revised, and a warning about this should be issued.
Also curious to know the answer and whether there is a good way to measure the effects at runtime given the parallelism involved.