Xuan Son Nguyen comments

Results 73 comments of


Xuan Son Nguyen

WIP: Add model `merge` example

@dnhkng Yeah in fact I have a typo error in `0-5,7,8-12`, it should be `0-6,7,8-12` This PR only aims to merge the weight linearly, meaning it does not add or...

WIP: Add model `merge` example

> Yeah in fact I have a typo error in `0-5,7,8-12`, it should be `0-6,7,8-12` It's true that the logic for my CONFIG argument is not correct. In fact, it...

WIP: Add model `merge` example

Thanks for the explanation. > This is why Frankenmerge models are larger than base models. According to discussion #4718 , gguf format maybe benefit by pointing 2 weights on metadata...

WIP: Add model `merge` example

Thanks for the input, I'll need to rework this PR in the next days. Regarding the format, I still having ability to specify weight of a and b separately can...

WIP: Add model `merge` example

@dnhkng I updated by PR to have the ability to: - Merge multiple models at once (not just 2 models) - Use the CSV format that we discussed To simplify...

WIP: Add model `merge` example

FYI, I was also thinking adding ability to merge quantized model, but at this stage it's quite tricky: I must dequantize it, do calculations with `float` then re-quantize it again....

WIP: Add model `merge` example

I'll try quantized model later. At least, loading a q4_K model then output it as f16 is not too complicated. Only requant part is too tricky for me. Also, just...

WIP: Add model `merge` example

> Reusing layers makes sense, but the caching is tricky. Personally thinking, shared cache among layers is not something technically possible though. While the weight is the same, KV is...

WIP: Add model `merge` example

> Yes, you can't share cache, it would get overwritten on the higher layer processing... But it still works! The results are worse though, but that's not unexpected. The fact...

WIP: Add model `merge` example

@dnhkng We're now accepting quantized models as input, but only output non-quant FP16 model (you can re-quant it using `./quantize` tool). Can you give a try? Thanks!