Casper
Casper
@tianyu-l I am mainly interested in a model architecture implementation. The remaining details like FP8 training and various forms of parallelism is already implemented in TorchTitan, which should be reused....
@blankanswer @braisedpork1964 @lcolok I would really appreciate if you can try to reproduce and fix the error. I tried again today and there is no way around this error, it...
Hi @braisedpork1964, I followed your instructions but I still get the same error after adding fewshot examples to InterpreterParser and PluginParser. Is it possible to figure out a permanent fix...
@EmilyQian2001 No I didn't solve the problem. This bug is breaking the whole interaction, rendering MindSearch useless. I love the concept of MindSearch, but it's current state is that it...
> Do you have an example of a public dataset that we can repro this on? Unfortunately I don't
Launching preprocessing in distributed mode is the main problem. You can probably create a dummy dataset of 1 million samples with 64k tokens each and try, but I cannot for...
Maybe the `axolotl preprocess` CLI should not launch with accelerate? What do you think @winglian?
I used axolotl train, triggered the error, then pivoted to axolotl preprocess and found the same error. I will need to check the commands again, but I'm pretty sure I...
This does the trick. Though, I would recommend using something else than `with zero_first(is_local_main_process())` in general. This lowers QoL when using axolotl and could be replaced with a simpler FileLock...
> [@casper-hansen](https://github.com/casper-hansen) agreed, feel free to make a PR! Or, I'll probably do so later. I probably won't be creating the PR, but let's leave this issue open until a...