Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

`AutoLigerKernelForCausalLM.from_config` support

Open sfc-gh-sbekman opened this issue 1 month ago • 0 comments

🚀 The feature, motivation and pitch

When creating a new model from scratch ideally we would want to use AutoLigerKernelForCausalLM.from_config and not AutoLigerKernelForCausalLM.from_pretrained but it looks like from_config doesn't take care of liger-kernel custom kwargs, e.g. I'd expect this to work:

            swiglu=False
            if self.using_random_model:
                # skip the weight loading for a faster startup if we are in a random model configuration mode
                return AutoLigerKernelForCausalLM.from_config(
                    model_config,
                    dtype=self.config.dtype.value,
                    swiglu=swiglu,
                )
            else:
                return AutoLigerKernelForCausalLM.from_pretrained(
                    name_or_path,
                    config=model_config,
                    dtype=self.config.dtype.value,
                    swiglu=swiglu,
                )

but it appears it just sub-classes the original class and we end up with:

[rank2]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 456, in from_config
[rank2]:     return model_class._from_config(config, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2311, in _from_config
[rank2]:     model = cls(config, **kwargs)
[rank2]:             ^^^^^^^^^^^^^^^^^^^^^
[rank2]: TypeError: Qwen3MoeForCausalLM.__init__() got an unexpected keyword argument 'swiglu'

Actually, I'm not even sure if it registers liger-kernel functionality at all when creating the model object via this route. Does it? If it's not it should assert.

Thank you.

sfc-gh-sbekman avatar Nov 18 '25 00:11 sfc-gh-sbekman