EasyEdit icon indicating copy to clipboard operation
EasyEdit copied to clipboard

Recommended ROME layers and v_loss_layer for Qwen3 series (0.6B, 1.7B, 4B, 8B)

Open Salehoof opened this issue 2 months ago • 4 comments

Hi EasyEdit Team,

First, thank you for this fantastic library!

I am currently working on adding support for the new Qwen3 model series (0.6B, 1.7B, 4B, and 8B) to use with the ROME algorithm. I've been creating the necessary .yaml config files, but I'm unsure about the correct hyperparameters to set for layers and v_loss_layer.

I understand from the ROME paper that the optimal layers parameter is found using Causal Tracing. To avoid having to run this analysis myself, I was hoping you might have these "official" values from your own testing, similar to how Qwen2.5-7B-Instruct is set to layers: [5].

Here is a summary of the model architectures as I understand them. Could you please help me fill in or confirm the recommended values for the layers parameter?

Model Total Layers v_loss_-layer (My Guess) Recommended layers
Qwen2.5-7B-Instruct 28 27 [5] (from repo)
Qwen3-0.6B 28 27 ?
Qwen3-1.7B 28 27 ?
Qwen3-4B 36 35 ?
Qwen3-8B 36 35 ?

For my Qwen3-8B.yaml file, I set v_loss_layer: 35 (since it has 36 layers) and took a guess with layers: [18] (the 50% midpoint). However, I'm not sure if this is optimal, or if I should follow the Qwen2.5 heuristic (layer 5/28 ≈ 18% depth) and choose a layer like [6] or [7] instead.

Could you please provide the recommended layers to use for these Qwen3 models for ROME?

Thank you for your help!

Salehoof avatar Oct 30 '25 08:10 Salehoof

Same, I'm working on the MEMIT & ROME methods on the Qwen3 series, hoping to get official hparams settings.

gxx27 avatar Nov 04 '25 05:11 gxx27

we will work on this, It’s EMNLP season recently.

zxlzr avatar Nov 04 '25 05:11 zxlzr

Thank you for your response! Another issue is that, seems the environment cannot recognize Qwen3 models (transformers version too old), hoping it can be fixed together.

gxx27 avatar Nov 04 '25 06:11 gxx27

Sorry for being late, Here are some solutions for the time being.

  1. Use nnsight or transformer_lens to conduct causal tracing to decide the layer for updating. These are empirical results, so feel free to test.
  2. You can update the transformer version on your own. It may conflict with the qformer section, but you can annotate the code if you do not need the multimodal part.

littlefive5 avatar Nov 04 '25 07:11 littlefive5

Quick update:

  1. I updated the requirements, and the current code can support the Qwen3 series code.
  2. For the Qwen3‘s suggested layer. I write a code for casual tracing here: https://colab.research.google.com/drive/1ZFAtbDzSW3eK4tMhBwUyXtCPJG5K2FuH?usp=sharing

You can test this on your own data and avg the different to determine the targeted layer.

littlefive5 avatar Nov 18 '25 09:11 littlefive5