alpa
alpa copied to clipboard
IndexError: `InlinedVector::at(size_type) const` failed bounds check
Please describe the bug Hello Alpa team, I tried the benchmark in my system with 8 GPUs. When I try the command 'python benchmark --suite gpt.grid_search_auto' , I run into the error shown in the figure. I checked the printed information, this error happens in the compiling process of all stages when profiling for submesh (1, 4). There are no errors in the profiling process of submesh (1, 8).
System information and environment
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker): Linux Ubuntu 16.04 with 8 GPUs.
- Python version: 3.7.12
- CUDA version: cuda 11.1
- NCCL version: 2.8.4
- cupy version: cupy-cuda111 11.0.0
- GPU model and memory: A100 80GB
- Alpa version: 1.0.0.dev0
- TensorFlow version: 2.9.1
- JAX version: 0.3.5
To Reproduce Steps to reproduce the behavior:
- python gen_prof_database.py --max-comm-size-intra-node 32 --max-comm-size-inter-node 29
- python benchmark --suite gpt.grid_search_auto
- See error
Screenshots
Could you please help me out of it? Thanks a lot.
请问您解决了吗,我也出现了这个错误