erichtho

Results 5 comments of erichtho

Sorry, I can't share my code, it's a part of big project. I'm try to simplify it, but can't reproduce the error with simplified code(still trying). Onnx model is transformed...

Yes, it's related to request concurrency. And I feel like it appear with higher opportunity when there are lots of request with almost maximum shape. I checked with top, dmesg,...

Same error. and `nvcc --version` matches `python -c 'import torch; print(torch.version.cuda);'` ![截屏2024-04-17 11 16 13](https://github.com/hpcaitech/Open-Sora/assets/30315656/bd08054f-cf40-4c99-b84a-250287f7742c)

> Same error. and `nvcc --version` matches `python -c 'import torch; print(torch.version.cuda);'` ![截屏2024-04-17 11 16 13](https://private-user-images.githubusercontent.com/30315656/323059532-bd08054f-cf40-4c99-b84a-250287f7742c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTMzMzg1OTMsIm5iZiI6MTcxMzMzODI5MywicGF0aCI6Ii8zMDMxNTY1Ni8zMjMwNTk1MzItYmQwODA1NGYtY2Y0MC00Yzk5LWI4NGEtMjUwMjg3Zjc3NDJjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDE3VDA3MTgxM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJkZjlkMWVjYjE5ZTI0ZTAzMDJkYTQ5MTE5NWM2M2I2MGI1NDgxMDgzMjZjNjUyYmZhMzc1NjhhODgzNmYyNTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.TnKT7RXikU_CD5cn8U2bavuaCVkNFuDvRMhCIuL43LI) also found with dmesg:

hi, I downgrade torch to 2.1.2 and resolve the problem(also changed xformers version to [v0.0.23.post1](https://github.com/facebookresearch/xformers/releases/tag/v0.0.23.post1)). here is how I locate problem: 1.debug with pdb, found is torch.nn.Conv3d raise segmentation fault...