Auto-PyTorch
Auto-PyTorch copied to clipboard
Make use of data more by devising subsampling
When we use a certain memory_allocation
[^1] in subsampling
, we reduce the number of samples until we reach the memory limit.
However, we need to come up with an appropriate value for this as when we set it too high, the training fails due to memory error while when we set it too low, we waste memory.
For now, we circumvent this issue by measuring the memory consumption when using the default config.
[^1]: The definition of the memory_allocation
is the following:
Absolute memory in MB, e.g. 10MB is "memory_allocation": 10
.
The memory used by the dataset is checked after each reduction method is performed.
If the dataset fits into the allocated memory, any further methods listed in "methods"
will not be performed.
The information is based on the runs on dev branch. Those passed when I ran with example for the tabular classification.
mem_limit, N, D = 3000, 500, 10000 # 3GB
mem_limit, N, D = 4000, 2500, 10000 # 4GB
mem_limit, N, D = 5000, 4500, 10000 # 5GB
mem_limit, N, D = 6000, 6500, 10000 # 6GB
mem_limit, N, D = 7000, 8500, 10000 # 7GB
mem_limit, N, D = 8000, 11500, 10000 # 8GB
mem_limit, N, D = 9000, 15000, 10000 # 9GB
Note that since neural networks become larger when using a larger input size, I used a large fixed D
and this D
is determined via the feature size in automlbenchmark.
Since when we use memory_limit=2000
, APT does not run, we use the following equation to calculate the memory_allocation
and raise an error when we get memory_allocation < 0
:
memory_allocation = (memory_limit - 3000) / 1000.0 * 160 + 40
The information is based on the runs on common modification branch.
mem_limit, N, D = 3000, 500, 10000 # 3GB
mem_limit, N, D = 4000, 3000, 10000 # 4GB
mem_limit, N, D = 5000, 5500, 10000 # 5GB
mem_limit, N, D = 6000, 8500, 10000 # 6GB
mem_limit, N, D = 7000, 11500, 10000 # 7GB
mem_limit, N, D = 8000, 15500, 10000 # 8GB
The training ratio was 0.75, so we might need to take the 75% of those values. Then 200MB increments will be 150MB increments and 80MB for the 3GB will 30MB.
memory_allocation = (memory_limit - 3000) / 1000.0 * 150 + 30