Shenggui Li

Results 142 comments of Shenggui Li

I think this is a version mismatch issue. Can you try it with torch 1.10 or 1.11?

> ⚠️ Keep in mind: multiple processes may access the same file, you should make sure the JSON file is consistent Each process should keep its own json file.

Thanks for the feedback, will add this soon.

Good point, was considering adding this tutorial as someone mentioned in the discussion post as well. I will write a tutorial to cover this next week.

I have assigned this issue to myself, will close this issue next week upon completion.

Hi, I believe there is some arithmetic error. Let's investigate into this problem 🔥

Data fetching, forward pass and back prop are implemented in the schedule. Thus, I don't think they are trainer hooks. Is there any use case for such hooks?

I do agree that this is not supported by Colossal-AI. I found these use cases are indeed not related to schedule if we are adding hooks to schedule.Splitting the batch...

I think if we can abstract this part, it will provide some flexibility and extensibility to the schedule class. For example, there is a `batch_data_process_func` parameter to allow some customization...

This usually occurs because of CUDA out-of-memory.