Charles O. Goddard

Results 151 comments of Charles O. Goddard

Thanks for reporting this! This is a silly bug in how I have the parameters hooked up to the tokenizer merging method - it's expecting a `weight` parameter even though...

A total anywhere in the 0-1.2 range will almost certainly be fine. You can probably get a lot weirder than that, but I'm still experimenting myself. The paper this method...

That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that...

It looks like `stablelm-zephyr-3b` and `stablecode-instruct-alpha-3b` are entirely different architectures. `stablecode-instruct-alpha-3b` is a GPT-NeoX based model and `stablelm-zephyr-3b` is using Stability's Llama-ish StableLM architecture. Unfortunately these are too different to...

Sorry for not getting to this sooner! You can definitely do this. You just need to consider what you want to use as a base model and be careful about...

Glad you've found it useful! In principle `tokenizer_source: union` should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised if you've hit...

My reasoning for that was pretty straightforward. We want the gate vector to maximize its dot product with the positive hidden states and minimize the dot product with the negative...

Could you give some more details please? I suspect this is #50, so it would be helpful to know how the model was configured.

I'll look into it and see if things can be massaged into the right format. Probably won't have time to get to it for a while though.

After looking into things I don't think a straightforward llamaization is possible - Phi includes bias terms for most things, which Llama does not. Something like CausalLM did with dropping...