NanoCode012
NanoCode012
To add more info from discord discussion, the problem stems from the eval_table code, which was written quite some time ago and hasn't been actively maintained. At this point, I'm...
> [@winglian](https://github.com/winglian) that is good to know. What about the triton 3.2.0 issue that throws the PY_SSIZE_T_CLEAN error? Do you have the stack trace for that?
Hey! Thanks for the report. Let's see what upstream trl does first.
Hey! Thanks for checking back. In this case, you could override those 2 dataloader fn to return your custom `RepeatSampler` class. I looked a bit more and `curriculum_sampling` seems to...
Which GPUs are you using? I just used the CUDA_VISIBLE_DEVICES yesterday, and it seemed to not have this issue.
Hello, sorry I missed your earlier reply @zhanghanxing2022 . I ran your config (changing base_model + dataset) on 2xH200 SXM GPUs on runpod using our docker cloud image with `CUDA_VISIBLE_DEVICES='0,1'...
Closing as stale
Yeah, I think this can be a quick callback to add though I haven't verified `flos` refers to the FLOPS
I went and checked that `total_flos` is the FLOPS count, however, the number may be off (GH Issue about miscounting for embed layers). Given that it may be incorrect, I'm...
Forgot where this was thrown, but likely here https://github.com/axolotl-ai-cloud/axolotl/blob/80304c26a70e21ed8522fdbd53bcb290f9c6b7d3/src/axolotl/train.py#L246