Recommended ROME layers and v_loss_layer for Qwen3 series (0.6B, 1.7B, 4B, 8B)
Hi EasyEdit Team,
First, thank you for this fantastic library!
I am currently working on adding support for the new Qwen3 model series (0.6B, 1.7B, 4B, and 8B) to use with the ROME algorithm. I've been creating the necessary .yaml config files, but I'm unsure about the correct hyperparameters to set for layers and v_loss_layer.
I understand from the ROME paper that the optimal layers parameter is found using Causal Tracing. To avoid having to run this analysis myself, I was hoping you might have these "official" values from your own testing, similar to how Qwen2.5-7B-Instruct is set to layers: [5].
Here is a summary of the model architectures as I understand them. Could you please help me fill in or confirm the recommended values for the layers parameter?
| Model | Total Layers | v_loss_-layer (My Guess) |
Recommended layers |
|---|---|---|---|
| Qwen2.5-7B-Instruct | 28 | 27 | [5] (from repo) |
| Qwen3-0.6B | 28 | 27 | ? |
| Qwen3-1.7B | 28 | 27 | ? |
| Qwen3-4B | 36 | 35 | ? |
| Qwen3-8B | 36 | 35 | ? |
For my Qwen3-8B.yaml file, I set v_loss_layer: 35 (since it has 36 layers) and took a guess with layers: [18] (the 50% midpoint). However, I'm not sure if this is optimal, or if I should follow the Qwen2.5 heuristic (layer 5/28 ≈ 18% depth) and choose a layer like [6] or [7] instead.
Could you please provide the recommended layers to use for these Qwen3 models for ROME?
Thank you for your help!
Same, I'm working on the MEMIT & ROME methods on the Qwen3 series, hoping to get official hparams settings.
we will work on this, It’s EMNLP season recently.
Thank you for your response! Another issue is that, seems the environment cannot recognize Qwen3 models (transformers version too old), hoping it can be fixed together.
Sorry for being late, Here are some solutions for the time being.
- Use
nnsightortransformer_lensto conduct causal tracing to decide the layer for updating. These are empirical results, so feel free to test. - You can update the transformer version on your own. It may conflict with the
qformersection, but you can annotate the code if you do not need the multimodal part.
Quick update:
- I updated the requirements, and the current code can support the Qwen3 series code.
- For the Qwen3‘s suggested layer. I write a code for casual tracing here: https://colab.research.google.com/drive/1ZFAtbDzSW3eK4tMhBwUyXtCPJG5K2FuH?usp=sharing
You can test this on your own data and avg the different to determine the targeted layer.