🐛[BUG]: Failed to allocate memory for requested buffer of size 1851310080
Version
source - main
On which installation method(s) does this occur?
Pip
Describe the issue
I run the example 02_model_comparison:
print("Running Pangu inference") pangu_ds = inference_ensemble.run_basic_inference( pangu_inference_model, n=24, # Note we run 24 steps here because Pangu is at 6 hour dt (6 day forecast) data_source=pangu_data_source, time=time, ) pangu_ds.to_netcdf(f"{output_dir}/pangu_inference_out.nc") print(pangu_ds)
RuntimeError Traceback (most recent call last)
5 frames /usr/local/lib/python3.10/dist-packages/earth2mip/inference_ensemble.py in run_basic_inference(model, n, data_source, time) 284 arrays = [] 285 times = [] --> 286 for k, (time, data, _) in enumerate(model(time, x)): 287 arrays.append(data.cpu().numpy()) 288 times.append(time)
/usr/local/lib/python3.10/dist-packages/earth2mip/networks/pangu.py in call(self, time, x, normalize, restart) 247 dt = torch.tensor(self.time_step.total_seconds()) 248 x1 += self.source(x1, time1) * dt --> 249 x1 = self.model_6(x1) 250 yield time1, x1, restart_data 251
/usr/local/lib/python3.10/dist-packages/earth2mip/networks/pangu.py in call(self, x) 142 143 def call(self, x): --> 144 return self.forward(x) 145 146 def to(self):
/usr/local/lib/python3.10/dist-packages/earth2mip/networks/pangu.py in forward(self, x) 156 pl = pl.resize(*pl_shape) 157 sl = surface[0] --> 158 plo, slo = self.model(pl, sl) 159 return torch.cat( 160 [
/usr/local/lib/python3.10/dist-packages/earth2mip/networks/pangu.py in call(self, fields_pl, fields_sfc) 122 output = bind_output("output", like=fields_pl) 123 output_sfc = bind_output("output_surface", like=fields_sfc) --> 124 self.ort_session.run_with_iobinding(binding) 125 return output, output_sfc 126
/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run_with_iobinding(self, iobinding, run_options)
329 :param run_options: See :class:onnxruntime.RunOptions.
330 """
--> 331 self._sess.run_with_iobinding(iobinding._iobinding, run_options)
332
333 def get_tuning_results(self):
RuntimeError: Error in execution: Non-zero status code returned while running BiasSoftmax node. Name:'BiasSoftmax' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 1851310080
I don't know what went wrong? But I used the same environment to try directly loading pangu_weather_6.onnx and inference,the results are normal.
Environment details
Kaggle,GPU T4 * 2
!pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 49C P0 26W / 70W | 13623MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 38C P8 9W / 70W | 3MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Sorry,it's not a bug.
- Install optional dependencies for Pangu weather: $ pip install .[pangu]
- changed n from 24 to 12
- only load pangu_weather_6.onnx pangu.load_6(package)
Then that's it.
Thanks for the update. I'll close this then.