hohoCode

Results 23 comments of hohoCode

I realized that if scripts can be rerun again, the inference will be on 8 GPUs. However, if I put the training function and the inference function together in one...

Could you please also test try bigger models like Flan-t5-XXL or XL? Currently it seems weird with deepspeed on, looks like a perfect candidate for the deepspeed trainer. Thanks.

> > Could you please also test try bigger models like Flan-t5-XXL or XL? Currently it seems weird with deepspeed on, looks like a perfect candidate for the deepspeed trainer....

BTW, I tested run the codes with 'deepspeed_stage_3_offload' given Flan-t5-xl and "bf16" as the data type. The deepspeed shows it has the "must have the same dtype" error: File deepspeed/runtime/zero/linear.py,...

Also, as you have probably noticed, the new SOTA Tabular DL model seems to be "Trompt": https://arxiv.org/pdf/2305.18446.pdf One existing implementation of this model in Pytorch is here: https://github.com/pyg-team/pytorch-frame?tab=readme-ov-file#benchmark If there...

What is your pyg version? Mine is the latest version.

Thanks for the work. And look forward to this great new feature!

Great work and cannot wait to try it when ready. Just a general comment: can you make sure o1 is also fully supported with SELA? Because currently it seems o1...