Auto-PyTorch
Auto-PyTorch copied to clipboard
[memo] High memory consumption and the places of doubts
I write down the current memory usage as a memo just in case when we encounter memory leak issues in the future. This post is based on the current implementation.
When we run a dataset with the size of 300B, AutoPytorch consumes ~1.5GB and the followings are the major source of the memory consumptions:
Source | Consumption [GB] |
---|---|
Import modules | 0.35 |
Dask Client | 0.35 |
Logger (Thread safe) | 0.4 |
Running of context.Process in multiprocessing module | 0.4 |
Model | 0 ~ inf |
Total | 1.5 ~ inf |
When we run a dataset with the size of 300MB (400,000 instances x 80 features) such as Albert, AutoPytorch consumes ~2.5GB and the followings are the major source of the memory consumptions:
Source | Consumption [GB] |
---|---|
Import modules | 0.35 |
Dask Client | 0.35 |
Logger (Thread safe) | 0.4 |
Dataset itself | 0.3 |
self.categories in InputValidator | 0.3 |
Running of context.Process in multiprocessing module | 0.4 |
Model (e.g. LightGBM) | 0.4 ~ inf |
Total | 2.5 ~ inf |
All the information was obtained by:
$ mprof run --include-children python -m examples.tabular.20_basics.example_tabular_classification
and the logger which I set for the debugging. Note that I also added time.sleep(0.5)
before and after the line of interest to eliminate the possibilities of the influences from other elements and checked each line in detail.
Interesting :), I think the analysis in the future should also be extended to the following datasets:
https://archive.ics.uci.edu/ml/datasets/covertype
https://archive.ics.uci.edu/ml/datasets/HIGGS
https://archive.ics.uci.edu/ml/datasets/Poker+Hand
They proved tricky.
FYI, when we use Optuna with a tiny model, we consume around only 150MB. This module is also thread safe.
import optuna
def objective(trial):
x0 = trial.suggest_uniform('x0', -10, 10)
x1 = trial.suggest_uniform('x1', -10, 10)
return x0 ** 2 + x1 ** 2
if __name__ == '__main__':
study = optuna.create_study()
study.optimize(objective, n_trials=5000, n_jobs=4)
I tested the memory usage for the following datasets:
Dataset name | # of features | # of instances | Approx. Datasize [MB] |
---|---|---|---|
Covertype | 55 | 581012 | 60 ~ 240 |
Higgs | 29 | 98050 | 5 ~ 20 |
Poker-hands | 11 | 1025009 | 22 ~ 90 |
The details of the memory usage are the followings:
Source | Consumption in covertype [GB] | Consumption in higgs [GB] | Consumption in pocker-hand [GB] |
---|---|---|---|
Import modules | 0.35 | 0.35 | 0.35 |
Dask Client | 0.35 | 0.35 | 0.35 |
Logger (Thread safe) | 0.35 | 0.35 | 0.35 |
Dataset itself | 0.1 | 0.05 | 0.1 |
self.categories in InputValidator | 0 | 0 | 0.02 |
Running of context.Process in multiprocessing module | 0.4 | 0.4 | 0.4 |
LightGBM | 0.6 | 0.1 | 0.3 |
CatBoost | 0.8 | 0.1 | 0.6 |
Random Forest | 1.2 | 0.5 | 1.0 |
Extra Trees | 1.2 | 0.2 | 1.1 |
SVM | 0.9 | 0.2 | 0.6 |
KNN | 0.8 | - | 0.4 |
Total | 2.0 ~ | 1.5 ~ | 1.7 ~ |
Note that KNN failed in Higgs and some trainings for each dataset were canceled because of memory out error.
This time, I set memory_limit = 4096
, but I somehow got memory out error with lower numbers such 2.5 ~ 3.0 GB.
Probably it is better to check if it works well on the latest branch as well.
This is from #259 by @franchuterivera.
- [x] We should not let the datamanager actively reside in memory when we are not using it. For example, there is no need to have a datamanager in smbo.
- [x] Also, after search has save the datamanager to disk, we can delete and garbage collect it.
- [x] We also should datacollect and challenge the need of datamanager in the evaluator
- [ ] We should improve the cross validation handling of the out of fold predictions. Rather than having a list that contains the OOF predicitons here we should have a fixed array of n_samples created once at the beginning. OOF predictions from the k-fold model should be added smartly to this pre-existing array, something like
self.Y_optimization[test_indices] = opt_pred
. This ways predictions are sorted and can be used directly by ensemble selection without the need of saving this array - [ ] Calculating the train loss should be optional, not done by default here. We should prevent doing predict if not strictly needed.
- [ ] As reported already by @nabenabe0928 the biggest contribution comes from import files. In particular, just doing
import torch
consumes 2Gb of peak virtual memory and the majority of times this happens is for mypy typing. We should encapsulate these calls undertyping.TYPE_CHECKING
and only import the strictly needed class from pytorch.
Check if we can use generator
instead of np.ndarray