Charles O. Goddard comments

Results 151 comments of


                                            Charles O. Goddard

merging two llama2 model but different special vocab tokens

Thanks for reporting this! This is a silly bug in how I have the parameters hooked up to the tokenizer merging method - it's expecting a `weight` parameter even though...

Sane defaults for dare_ties merging?

A total anywhere in the 0-1.2 range will almost certainly be fine. You can probably get a lot weirder than that, but I'm still experimenting myself. The paper this method...

Sane defaults for dare_ties merging?

That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that...

No rule to produce...

It looks like `stablelm-zephyr-3b` and `stablecode-instruct-alpha-3b` are entirely different architectures. `stablecode-instruct-alpha-3b` is a GPT-NeoX based model and `stablelm-zephyr-3b` is using Stability's Llama-ish StableLM architecture. Unfortunately these are too different to...

Can you conduct TIES merging only on the embedding weights of two models?

Sorry for not getting to this sooner! You can definitely do this. You just need to consider what you want to use as a base model and be careful about...

Can Models with Different vocab_size be Merged?

Glad you've found it useful! In principle `tokenizer_source: union` should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised if you've hit...

Mixtral branch : What happens when we give both positive and negative prompts per an expert ?

My reasoning for that was pretty straightforward. We want the gate vector to maximize its dot product with the positive hidden states and minimize the dot product with the negative...

While quantized by awq , error KeyError: 'block_sparse_moe.experts.0.w2'`

Could you give some more details please? I suspect this is #50, so it would be helpful to know how the model was configured.

Convert Phi to Llama

I'll look into it and see if things can be massaged into the right format. Probably won't have time to get to it for a while though.

Convert Phi to Llama

After looking into things I don't think a straightforward llamaization is possible - Phi includes bias terms for most things, which Llama does not. Something like CausalLM did with dropping...