superduper [DistEnv] All outputs loaded in memory before bulk write to database

[DistEnv] All outputs loaded in memory before bulk write to database

Open kartik4949 opened this issue 1 year ago • 1 comments

trafficstars

When the model has predicted lets say for 1 million data points, in components/model.py : predict method

model stores the outputs of this 1 million data points into a single list outputs which will OOM when it exceeds memory.

refer : superduper/components/model.py: predict method.

Same thing happens in model inputs All inputs are loaded on memory before passing it to model, inputs are packed into a e.g Dataloader (refer: ext/torch/model.py: _predict method)

[x] #1627

We need to chunk the model inputs in the database and iterate over a chunk and pass it for model prediction.

Dec 30 '23 19:12 kartik4949

superduper superduper copied to clipboard

[DistEnv] All outputs loaded in memory before bulk write to database

superduper
superduper copied to clipboard