Gopal Trital
Gopal Trital
@shizhediao I know your reply above sort of already says, 'no' but just in case something's changed, do you think it is practically possible to do RAFT on {fine-tuned Falcon-7b}...
Btw, in my latest run, the code just stops -without any error - such a silent heartbreak! 
Hi @shizhediao Thanks for your reply. I began with single-GPU (~24GB); it stopped in here:  I then switched to AWS g5x.12 which has 4-GPU / ~24 GB each. It...
Hi @shizhediao , thanks for your reply. Btw, I'm testing this on raft_batch_size = 8 For the following: I've first used ds_config_zero2.json The process stops with CUDA out of memory...
I did a quick re-run (With zero-3) to capture the instantaneous cpu/ram usage when the error occurs: Here's the resourse use (closer to when we get the error: **but hadn't...
@shizhediao Today, I tried running the program in an instance that has higher RAM. (But with same number/size of GPU). I got pretty similar results. I also tried running the...
You saved my day. Only had to pip install setuptools==65.5.0. But, got an "ERROR: Could not find a version that satisfies the requirement pywin32==227" later on. Will post the update...
> To start, it looks like the checkpoint for your weights includes a wrapper `base_model.model.` in front of each parameter name, so PyTorch can't find the parameters it needs. I...
Ah, are you suggesting, I need to do this for reference model as well? 
Okay, I added this, and it seems to have worked! (Will update if I get through the whole process).  Another quick question: I saw in one of the posts...