Add `config` and `model` parameters to `extract()`
Add config and model parameters to extract()
Hey @mariano, creating an issue about the feature you proposed - I think it's a great idea and was something I had in mind, so please feel free to send a PR if you're interested!
Following up from #99, the idea is to let extract() accept either a ModelConfig or a pre-instantiated model directly for explicit provider selection and model reuse.
The API would look like:
# Pass a config
config = factory.ModelConfig(
model_id="gemini-2.5-flash",
provider="GeminiLanguageModel",
provider_kwargs={"api_key": "...", "temperature": 0.7}
)
result = extract(text_or_documents=content, config=config, ...)
# Or pass a model directly
model = factory.create_model(config)
result = extract(text_or_documents=content, model=model, ...)
Precedence would be: model > config > model_id+kwargs > language_model_type (deprecated).
We'd need tests for precedence, factory feature preservation, and backward compat. This would complement the factory infrastructure from #97 in a nice way.
Could also be interesting to support loading configs from JSON/YAML files down the road - would make it easy to manage different model configurations.
@aksg87 Perfect. I'll create the PR tomorrow
Sounds great, thanks!
@aksg87 Regarding the confirmation file approach it sounds great. toml feels better for future-proofing the configuration in a text-friendly way, but yaml and json are definitely more popular and less pythoniac 🤓. Opens the door for some fun use cases where the extraction settings come from LLM generated toml, specific for the job at hand
Hi @mariano, I agree, tomllib is a great option, especially since it's part of the standard library. You make a good point that while the current configs are simple, using a dedicated file format is a great way to plan for future complexity.
On a related note, I've seen a few issues where users can't pass specific parameters to providers like Ollama. A very useful contribution would be to implement a generalized fix that allows any extra parameters in the config to be passed through to the underlying model. Let me know if that's something you'd be interested in exploring!