Auto-PyTorch Make use of data more by devising subsampling

When we use a certain memory_allocation [^1] in subsampling, we reduce the number of samples until we reach the memory limit. However, we need to come up with an appropriate value for this as when we set it too high, the training fails due to memory error while when we set it too low, we waste memory.

For now, we circumvent this issue by measuring the memory consumption when using the default config.

[^1]: The definition of the memory_allocation is the following: Absolute memory in MB, e.g. 10MB is "memory_allocation": 10. The memory used by the dataset is checked after each reduction method is performed. If the dataset fits into the allocated memory, any further methods listed in "methods" will not be performed.

Apr 01 '22 12:04 nabenabe0928

The information is based on the runs on dev branch. Those passed when I ran with example for the tabular classification.

mem_limit, N, D = 3000, 500, 10000  # 3GB
mem_limit, N, D = 4000, 2500, 10000  # 4GB
mem_limit, N, D = 5000, 4500, 10000  # 5GB
mem_limit, N, D = 6000, 6500, 10000  # 6GB
mem_limit, N, D = 7000, 8500, 10000  # 7GB
mem_limit, N, D = 8000, 11500, 10000  # 8GB
mem_limit, N, D = 9000, 15000, 10000  # 9GB

Note that since neural networks become larger when using a larger input size, I used a large fixed D and this D is determined via the feature size in automlbenchmark.

Since when we use memory_limit=2000, APT does not run, we use the following equation to calculate the memory_allocation and raise an error when we get memory_allocation < 0:

memory_allocation = (memory_limit - 3000) / 1000.0 * 160 + 40

Apr 01 '22 19:04 nabenabe0928

The information is based on the runs on common modification branch.

mem_limit, N, D = 3000, 500, 10000  # 3GB
mem_limit, N, D = 4000, 3000, 10000  # 4GB
mem_limit, N, D = 5000, 5500, 10000  # 5GB
mem_limit, N, D = 6000, 8500, 10000  # 6GB
mem_limit, N, D = 7000, 11500, 10000  # 7GB
mem_limit, N, D = 8000, 15500, 10000  # 8GB

The training ratio was 0.75, so we might need to take the 75% of those values. Then 200MB increments will be 150MB increments and 80MB for the 3GB will 30MB.

memory_allocation = (memory_limit - 3000) / 1000.0 * 150 + 30

Apr 05 '22 15:04 nabenabe0928

Auto-PyTorch Auto-PyTorch copied to clipboard

Make use of data more by devising subsampling

Auto-PyTorch
Auto-PyTorch copied to clipboard