Fast-LLM icon indicating copy to clipboard operation
Fast-LLM copied to clipboard

[feat] Generalize dynamic config classes

Open jlamypoirier opened this issue 10 months ago • 5 comments

🧐 Problem Description

#104 introduced a mechanism tor selecting config classes dynamically. This can be made useful elsewhere, especially for user-made plugins.

💡 Proposed Solution

  • Generalize the config mechanism to all configs classes by moving it to the base Config class
  • Add a mechanism for loading modules dynamically so plugins can be added to the registry (argument to the fast-llm command, or together with the config class?)

🔄 Alternatives Considered

Having tons of forks for custom features is not practical in the long term.

📈 Potential Benefits

More flexible configuration, facilitate user contributions outside the main branch

jlamypoirier avatar Jan 22 '25 19:01 jlamypoirier

I think this can work well with Python's implicit namespace packages, https://peps.python.org/pep-0420/ In a nutshell, we can leave the fast_llm namespace open, so that third-party packages can add to it. As long as these packages are in the python path, python will find them and create the full fast_llm namespace dynamically. All that's needed is removing the init.py files in modules that should be open and their parents.

tscholak avatar Jan 22 '25 20:01 tscholak

Not sure how much this helps though, since we still need to add a way to auto-import. We could import them all, but it would add to the startup time and the config/implementation separation complicates things a bit. I was thinking about somethings that makes the imports explicit, some options:

  1. In the fast-llm command .Maybe not ideal since it's too separate from the config. fast-llm --plugins my_package.my_module other_package.other_module train ....

2 Somewhere near the config root. All in one place but distant from the classes that use them

training: ...
plugins: 
  - my_package.my_module
  - other_package.other_module
...
  1. Next to the class name. Fully modular, but a bit scattered over the place.
data:
  datasets:
    Training:
      type: my_dataset
      module: my_package.my_module
...

jlamypoirier avatar Jan 22 '25 20:01 jlamypoirier

Probably 2. is best.

Do we really need this though, especially now? I'd rather focus on making the codebase more friendly to work with for people who need or want to directly contribute to Fast-LLM's development.

tscholak avatar Jan 23 '25 17:01 tscholak

I'm not planning to implement plugins now, I think it should come once we have 2-3 places with dynamic classes and/or other teams working on this that would benefit from it.

jlamypoirier avatar Jan 23 '25 19:01 jlamypoirier

Short-term use case: components for testing, ex GPTTestSlowDatasetConfig from #129

jlamypoirier avatar Jan 24 '25 14:01 jlamypoirier