Logan Adams
Logan Adams
Good to know, I'll post an update here shortly.
@aciborowska - what model were you trying to train when you first hit this? @StevenArzt thanks, starting work on this now.
I'll prioritize this work, thanks @dblakely and @PaulScotti for your feedback
@gary-young and @chongxiaoc - work is continuing on this [here](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FDeepSpeed%2Fpull%2F4707%2Ffiles%2F707e37d6b7bc596df83329d05d453fdc3e9c6fd5..da81c3080f0ba1186a6d0a84eab38be4aaabed89&data=05%7C02%7CLogan.Adams%40microsoft.com%7C6f279bd634b642dc92ed08dc093f378f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638395414333790972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fLOo1cWtpxNIyDzsEl2%2FF2WPIrOyIOI%2FlqVzapvmRDI%3D&reserved=0), please see that for status and to test the work.
@0781532 - I'd recommend starting a new issue to share your error code and s simple repro case if possible.
This same issue was detected in DeepSpeed - however since this only uses internal files we've determined we do not need this.
Hi @RodriMora - that error looks to be unrelated to DeepSpeed, and if this is the full error: ``` /home/ubuntuai/axolotl/.venv/lib/python3.10/site-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory 12 |...
Hi @summer-silence - you will need to follow the directions from that error and also install one of the packages that is listed there, libc6 for example.
Hi @AbhayGoyal - closing this issue for now, since we've recommended the right way to follow up on, there are also CI tests linked. If you have other questions, please...
Hi @allanj - do you know what signal is being sent to your processes? We just added support to clean up gracefully for SIGINT and SIGTERM. Could you try with...