tfjs
tfjs copied to clipboard
Memory Leak During Prediction with `{training: true}`
System information
- TensorFlow.js version: 4.10.0
Describe the current behavior There has been lots of discussion about model prediction with correct batchnorm norms (https://github.com/tensorflow/tfjs/issues/3152) which comes down to:
model.apply(tensor, {training: true})
Instead of
model.apply(tensor)
In my specific model, this is leaking 2 tensors
Describe the expected behavior
No memory leaks regardless of training situation.
Standalone code to reproduce the issue
Fully standalone: https://drive.google.com/file/d/1DB9UzPDpZ8umTwsA8dvlWElCkU2ykLyw/view?usp=sharing (to reproduce, I ran http-server .)
const model = await tf.loadLayersModel('model.json');
const tensor = tf.zeros([1, 1, 256, 256, 3]).toFloat();
tensor.print();
// Must apply model in training=True mode to avoid using aggregated norm statistics
const beforeTensors = tf.memory().numTensors
const pred = tf.tidy(() => model.apply(tensor, {training: true}));
console.log('leaking', tf.memory().numTensors - beforeTensors - 1, 'tensors');
// leaking 2 tensors
const beforeTensors2 = tf.memory().numTensors
const pred2 = tf.tidy(() => model.apply(tensor));
console.log('leaking', tf.memory().numTensors - beforeTensors2 - 1, 'tensors');
// leaking 0 tensors
Hi, @AmitMY
Thank you for bringing this issue to our attention and I tried to replicate the same issue from my end and I'm also getting the same result which you've mentioned above even I tried with previous versions @tensorflow/[email protected]
, @tensorflow/[email protected]
but issue still exists so we'll have to dig more into this issue and will update you soon. Thank you!
I have added screenshot below for reference :
There has always been a memory leak, I have a training cycle, the same thing over and over again, each time the size of the program in memory increases, and so does the execution time. Take the simplest model training, any, put it in a loop for 12 hours, and everything will be visible.
Hi, @AmitMY, @borodadada
I apologize for the delayed response and as per my current understanding when the model.apply()
function is called with {training: true}
, it internally creates new tensors that are not disposed of within the tidy
scope.
This leads to a memory leak because the intermediate tensors created by model.apply()
with {training: true}
are not cleaned up properly. As a result, the number of tensors in memory increases, leading to memory exhaustion over time if this pattern is repeated and to avoid the memory leak, you can modify the code to explicitly dispose of the tensors created by model.apply()
with {training: true}
using the tf.dispose()
function.
If I have missed something here please let me know. Thank you.
async function main() {
const model = await tf.loadLayersModel('web_model/model.json');
const tensor = tf.zeros([1, 1, 256, 256, 3]).toFloat();
tensor.print();
const beforeTensors = tf.memory().numTensors;
const pred = tf.tidy(() => {
const prediction = model.apply(tensor, { training: true });
tf.dispose([tensor, prediction]); // Dispose of intermediate tensors
return prediction;
});
console.log('leaking', tf.memory().numTensors - beforeTensors - 1, 'tensors');
// leaking 0 tensors
}
main();
While I understand your solution - I think that addressing it explicitly will cause problems for other users (as in, it will be fixed for me, but others might encounter invisible memory leaks until they might find this specific issue)
Is there no way to fix it in the core?
why can’t you add an initialization function, run it every time after training is completed, without input data, just a function, or I don’t understand something