Daniel Han comments

Results 781 comments of


                                            Daniel Han

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@its5Q That's very weird :( For me it seems to work perfectly. I have an example if you can run this: ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained(...

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

Also @its5Q you need to use padding_side = "left" or else the results will be wrong

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@its5Q im thinking if somehow I can default it to left, since people have said this was an ongoing issue!

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@JIBSIL Oh if you select `do_sample = False` there is no randomness involved. On the `left` issue - the issue is for training, this makes training more complex, and Unsloth...

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@its5Q Whoops you're correct! I decided to just run the notebook - I 100% finally fixed it now oh lord so sorry!!! The issue of multiple model supports :( ![image](https://github.com/unslothai/unsloth/assets/23090290/71f1582c-05c7-4a4c-af5c-22f68580827c)

Can we use unsloth to train Reward Models?

@armsp LoRA and QLoRA for reward models, PPO, DPO etc are all supported - ie anything TRL does, we can do :) But it just needs to be LoRA /...

Can we use unsloth to train Reward Models?

@armsp Sadly I don't - I have DPO, but the rest you'll have to read the TRL docs

Can we use unsloth to train Reward Models?

Fantastic!

Can we use unsloth to train Reward Models?

@armsp Oh no :( I'll check again and get back to you - sorry on the issue!

Can we use unsloth to train Reward Models?

Extreme apologies been extremely busy on my end - so apologies again didn't have time to look at this :(