Gopal Trital comments

Results 15 comments of


                                            Gopal Trital

Multi-GPU operation seems to be problematic

@shizhediao I know your reply above sort of already says, 'no' but just in case something's changed, do you think it is practically possible to do RAFT on {fine-tuned Falcon-7b}...

Multi-GPU operation seems to be problematic

Btw, in my latest run, the code just stops -without any error - such a silent heartbreak! ![image](https://github.com/OptimalScale/LMFlow/assets/51286679/fbb6d10d-0d3a-47dc-aa8c-b50192bc333d)

Multi-GPU operation seems to be problematic

Hi @shizhediao Thanks for your reply. I began with single-GPU (~24GB); it stopped in here: ![image](https://github.com/OptimalScale/LMFlow/assets/51286679/44879a54-b257-4552-8525-1258c81889e0) I then switched to AWS g5x.12 which has 4-GPU / ~24 GB each. It...

Multi-GPU operation seems to be problematic

Hi @shizhediao , thanks for your reply. Btw, I'm testing this on raft_batch_size = 8 For the following: I've first used ds_config_zero2.json The process stops with CUDA out of memory...

Multi-GPU operation seems to be problematic

I did a quick re-run (With zero-3) to capture the instantaneous cpu/ram usage when the error occurs: Here's the resourse use (closer to when we get the error: **but hadn't...

Multi-GPU operation seems to be problematic

@shizhediao Today, I tried running the program in an instance that has higher RAM. (But with same number/size of GPU). I got pretty similar results. I also tried running the...

Pip install error with gym and torch

You saved my day. Only had to pip install setuptools==65.5.0. But, got an "ERROR: Could not find a version that satisfies the requirement pywin32==227" later on. Will post the update...

RuntimeError: Error(s) in loading state_dict on a custom model

> To start, it looks like the checkpoint for your weights includes a wrapper `base_model.model.` in front of each parameter name, so PyTorch can't find the parameters it needs. I...

RuntimeError: Error(s) in loading state_dict on a custom model

Ah, are you suggesting, I need to do this for reference model as well? ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/51286679/79f3fa25-67c5-440b-b353-cba09203c643)

RuntimeError: Error(s) in loading state_dict on a custom model

Okay, I added this, and it seems to have worked! (Will update if I get through the whole process). ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/51286679/8622546a-d262-4f20-959c-aaa8aa5d540f) Another quick question: I saw in one of the posts...