mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

The DARE-TIES experiment.

Open David-AU-github opened this issue 1 year ago • 4 comments

I just wanted to pass on some "lab" results using dare-ties and mistral nemo.

I created a triple dare-ties merge of 3 pass-through "instruct/fine" models.

Each instruct/fine tune uses the same merge format:

slices:

  • sources:
    • model: g:/11b/Mistral-Nemo-Instruct-2407-12B layer_range: [0, 14]
  • sources:
    • model: G:/11B/Rocinante-12B-v1.1 layer_range: [8, 24] parameters: scale: - filter: o_proj value: 1 - filter: down_proj value: 1 - value: 1
  • sources:
    • model: g:/11b/Mistral-Nemo-Instruct-2407-12B layer_range: [14, 22] parameters: scale: - filter: o_proj value: .5 - filter: down_proj value: .5 - value: 1
  • sources:
    • model: g:/11b/Mistral-Nemo-Instruct-2407-12B layer_range: [22, 31] parameters: scale: - filter: o_proj value: .75 - filter: down_proj value: .75 - value: 1
  • sources:
    • model: G:/11B/Rocinante-12B-v1.1 layer_range: [24, 40] parameters: scale: - filter: o_proj value: 1 - filter: down_proj value: 1 - value: 1 merge_method: passthrough dtype: bfloat16

THE DARE-TIES:

models:

  • model: E:/MN-Rocinante-12B-v1.1-Instruct
  • model: E:/MN-magnum-v2.5-12b-kto-Instruct parameters: weight: .6 density: .8
  • model: E:/MN-12B-Celeste-V1.9-Instruct parameters: weight: .38 density: .6 merge_method: dare_ties tokenizer_source: union base_model: E:/MN-Rocinante-12B-v1.1-Instruct dtype: bfloat16

What is interesting here is that EACH TIME I run the "dare-ties" it creates a slightly different or VERY DIFFERENT model, despite no changes in the the models nor the settings.

This shows up in PPL and "real world" tests. PPL range of 7.7327 to 7.8024 ... and that is on just 10 generations.

Real world testing the "core" changes -> wow. Attibute, scale, word choice, sentence structure,... changes across the board.

I am not sure if this is a mistral nemo artifact or not.

From these 10, I did some merging of these using breadcrumbs ; wow. All I can say.

When everything is F32 ... they shine even brighter.

With enough generations + merging of the "best DNA" could create truly legendary model(s).

Just saying - job well done and then some!!!

NOTE: Models for "fine/instruct" and "DARE-TIES" supermerges are posted at my repo.

David-AU-github avatar Aug 29 '24 00:08 David-AU-github

If DARE-Ties gives dramatically different results each time, maybe I don't understand it correctly, but that sounds less like a good thing and more like a bad thing.

CasualDev242 avatar Aug 30 '24 15:08 CasualDev242

If DARE-Ties gives dramatically different results each time, maybe I don't understand it correctly, but that sounds less like a good thing and more like a bad thing.

This all depends... in my first case it was bad, because I deleted the source and found out the hard way... and it was a great version. That being said, in creating 10+ versions, the "Dna" of each model can be mapped, and these combined creating stronger models with specific attributes while reducing the negative ones.

One of the open questions is: Does this apply to other archs too? Llama2? 3? 3.1? ... And some of the other mergekit methods also involve this same type of "random pruning"... too. I mapped these out after looking at the programming code to verify operations.

A more interesting method or change may be pruning controls for DARE TIES , which limit the range.

David-AU-github avatar Aug 31 '24 01:08 David-AU-github

Thanks for sharing your results here!

DARE-TIES does have a randomized element, yeah - it's part of the algorithm by design. If you want more reproducible merges you can set a random seed by passing --random-seed <N> on the command line. I usually do when I'm iterating on a recipe that involves DARE.

cg123 avatar Aug 31 '24 21:08 cg123

Thanks for sharing your results here!

DARE-TIES does have a randomized element, yeah - it's part of the algorithm by design. If you want more reproducible merges you can set a random seed by passing --random-seed <N> on the command line. I usually do when I'm iterating on a recipe that involves DARE.

*** Thank you ; that was one of the questions I had ; thanks again ... I think there is so much untapped potential in mergekit yet to be discovered.

David-AU-github avatar Sep 02 '24 00:09 David-AU-github

I am currently trying to do dare ties and I'm a bit confused about 2 things:

  1. why can we pass a list of densities? From the original paper they were using a single density, how do those lists of densities work?
  2. what do weights mean in dare? I suppose we somehow weigh the rescaled weights

I would appreciate any experience based suggestions for the merge parameters

entfane avatar Nov 24 '25 22:11 entfane