langextract icon indicating copy to clipboard operation
langextract copied to clipboard

Plugin support for custom LLM providers now available

Open aksg87 opened this issue 4 months ago • 35 comments

Hi all,

LangExtract now supports third-party model providers through a new plugin and registry infrastructure. You can integrate custom LLM backends (Azure OpenAI, AWS Bedrock, custom inference servers, etc.) without modifying core LangExtract code.

Please check out the example and documentation: https://github.com/google/langextract/tree/main/examples/custom_provider_plugin

List of providers available: https://github.com/google/langextract/blob/main/COMMUNITY_PROVIDERS.md

Feel free to provide feedback or report any issues. Thank you!

@YunfanGoForIt @mariano @zc1175 @praneeth999 @JustStas @jkitaok

aksg87 avatar Aug 08 '25 12:08 aksg87

@aksg87 thanks for the feature! Do you think it would make sense to include some of the more popular providers (azure, bedrock) as core providers? Happy to provide a PR in the new format. Especially considering Azure OpenAI will be quite similar to the already supported OpenAI

JustStas avatar Aug 08 '25 14:08 JustStas

@aksg87 I have updated #38 with the new approach and switched to inheriting the bulk of the functionality from the openai.py provider. I think this way it doesn't generate much additional effort for long-term support while covering a recurrent need for many users (especially in corporate projects).

Otherwise, we could also add a LiteLLM core provider to cover many LLMs with minimum long-term support overhead - please tell me if you think this makes sense.

JustStas avatar Aug 08 '25 15:08 JustStas

@aksg87 Looks great. Question, currently extract() builds the model config and instance from params. Shouldn't it also be possible to pass either ModelConfig or a factory-instantiated model? So instead of:

result = extract(
    prompt_description=settings.specs.header.prompt,
    examples=settings.specs.header.examples,
    text_or_documents=content,
    language_model_type=OpenAILanguageModel,
    language_model_params={
        "base_url": "http://localhost:9001/v1",
        "api_key": environ["OPENAI_API_KEY"]
    },
    model_id="gpt-4o-mini",
    use_schema_constraints=False,
    fence_output=True,
    debug=False
)

We could do:

model_config = ModelConfig(
    model_id="gpt-4o-mini",
    provider=OpenAILanguageModel,
    provider_kwargs={
        "base_url": "http://localhost:9001/v1",
        "api_key": environ["OPENAI_API_KEY"]
    }
)
result = extract(
    prompt_description=settings.specs.header.prompt,
    examples=settings.specs.header.examples,
    text_or_documents=content,
    config=model_config,
    use_schema_constraints=False,
    fence_output=True,
    debug=False
)

Or even:

model_config = ModelConfig(
    model_id="gpt-4o-mini",
    provider=OpenAILanguageModel,
    provider_kwargs={
        "base_url": "http://localhost:9001/v1",
        "api_key": environ["OPENAI_API_KEY"]
    }
)
model = factory.create_model(model_config)
result = extract(
    prompt_description=settings.specs.header.prompt,
    examples=settings.specs.header.examples,
    text_or_documents=content,
    model=model,
    use_schema_constraints=False,
    fence_output=True,
    debug=False
)

Thoughts? I can create a PR for this if it sounds good.

mariano avatar Aug 08 '25 15:08 mariano

Hey @JustStas, thanks for the PR offer!

Given that we just landed a provider registry + plugin system (#97), I think the ideal path for Azure OpenAI and Bedrock would be as external provider plugins. We actually added OpenAI as a stopgap given high demand, but even that might move to a plugin eventually - right now I want to keep the core focused on extraction features rather than managing the provider ecosystem.

Would you be up for creating langextract-azure-openai and/or langextract-bedrock as separate packages? Others like LiteLLM would also work great as a plugin.

Once you publish and a few folks validate it works well, we'll add a community providers section to feature it. This way the community can iterate quickly on provider support while we focus on improving the core extraction capabilities.

@mariano great idea on the config and model parameters - moved it to #106!

aksg87 avatar Aug 09 '25 06:08 aksg87

I would like to confirm the issue regarding whether there are bugs in the custom LLM. First, I executed it by customizing the LLM class method and found that there were bugs. So, I downloaded the source code again and executed it with the source code in the way of https://github.com/google/langextract/tree/main/examples/custom_provider_plugin, but still got an error.

python .\test_example_provider.py Traceback (most recent call last): File "C:\Users\Administrator\Desktop\langextract-main\examples\custom_provider_plugin\test_example_provider.py", line 22, in from langextract_provider_example import CustomGeminiProvider # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\Desktop\langextract-main\examples\custom_provider_plugin\langextract_provider_example_init_.py", line 17, in from langextract_provider_example.provider import CustomGeminiProvider File "C:\Users\Administrator\Desktop\langextract-main\examples\custom_provider_plugin\langextract_provider_example\provider.py", line 24, in @lx.providers.registry.register( ^^^^^^^^^^^^ AttributeError: module 'langextract' has no attribute 'providers'

===========================================================================

Meanwhile, he has other mistakes. For example, the following code import langextract as lx def main(): """Test the custom provider.""" api_key = "eesd" config = lx.factory.ModelConfig(model_id="gemini-2.5-flash",provider="CustomGeminiProvider",provider_kwargs={"api_key": api_key},) if name == "main": main()

===========================================================================

It's just these few lines of code, and it still reports an error.

python .\test_example_provider.py Traceback (most recent call last): File "C:\Users\Administrator\Desktop\langextract-main\examples\custom_provider_plugin\test_example_provider.py", line 9, in main() File "C:\Users\Administrator\Desktop\langextract-main\examples\custom_provider_plugin\test_example_provider.py", line 5, in main config = lx.factory.ModelConfig(model_id="gemini-2.5-flash",provider="CustomGeminiProvider",provider_kwargs={"api_key": api_key},) ^^^^^^^^^^ AttributeError: module 'langextract' has no attribute 'factory'

============================================================================ This is the information of langextract

pip show langextract Name: langextract Version: 1.0.5 Summary: LangExtract: A library for extracting structured data from language models Home-page: https://github.com/google/langextract Author: Author-email: Akshay Goel [email protected] License-Expression: Apache-2.0 Location: D:\ProgramData\miniconda3\envs\demo\Lib\site-packages Requires: absl-py, aiohttp, async_timeout, exceptiongroup, google-genai, ml-collections, more-itertools, numpy, openai, pandas, pydantic, python-dotenv, PyYAML, requests, tqdm, typing-extensions Required-by:

woshiwanlei1 avatar Aug 10 '25 02:08 woshiwanlei1

Update: The plugin system is now in main but not yet on PyPI (v1.0.5 was released ~10 hours before the PR merged).

To use it now, install from source:

git clone https://github.com/google/langextract.git
cd langextract
pip install -e .

Then check out examples/custom_provider_plugin/ for the implementation pattern. The next PyPI release will include the plugin system.

aksg87 avatar Aug 10 '25 21:08 aksg87

Accidentally removed a reference to a related plugin request. Reposting:

https://github.com/google/langextract/issues/109

aksg87 avatar Aug 11 '25 05:08 aksg87

Hi @JustStas, any interest in converting your PR into a plugin? I wanted to start a page on model plugins by the community so if you do, please let me know. Thanks!

aksg87 avatar Aug 11 '25 20:08 aksg87

@aksg87 sure, happy to build the plugin. Probably makes sense to implement both openai and azure openai in the same plugin (and later deprecated the built-in openai module)?

JustStas avatar Aug 11 '25 23:08 JustStas

Hi @JustStas, that sounds great! Plugins are very flexible so you're welcome to implement what you think fits best. When a plugin becomes popular, we can add it to a central reference area.

Also, I just added support for plugins to define their own schemas for controlled generation (see #130), so you should be able to implement nearly the entire surface area of the model-to-LangExtract interaction.

aksg87 avatar Aug 13 '25 10:08 aksg87

I'd really appreciate if someone in the community can make a llamacpp-langextract provider. I have been testing with ollama personally but I'd really prefer llama.cpp, because then it can be bundled automatically.

Kishlay-notabot avatar Aug 13 '25 10:08 Kishlay-notabot

A LiteLLM core provider would be fantastic especially with OpenAI-compatible API endpoint support

Outlines support would be a tremendous value add alongside this https://github.com/google/langextract/issues/101

torchss avatar Aug 14 '25 01:08 torchss

Hi all,

There's now a one-step script that generates a complete plugin template with all the boilerplate:

python scripts/create_provider_plugin.py MyProvider --with-schema

This should make it much easier to create custom providers. See PR #144 for details.

aksg87 avatar Aug 14 '25 11:08 aksg87

I've created a langextract-bedrock plugin (initial MR, issue) following the new custom provider guidance. Please leave feedback, or let me know if I should be approaching this differently. Thank you!

andyxhadji avatar Aug 14 '25 17:08 andyxhadji

Thanks @andyxhadji! Great to see langextract-bedrock as a working example of the plugin system. This will help others creating their own providers. Looking forward to feedback from anyone who tries it out.

aksg87 avatar Aug 14 '25 18:08 aksg87

@aksg87 @andyxhadji Is there a feature to add LiteLLM as a package? Since LiteLLM is technically supporting ALL packages, how do we manage some of the provider-specific configs, such as the Gemini schema, or the OpenAI fenced outputs? Do we need to track this somehow? @aksg87 if you can provide some insight, and can use the great open example from @andyxhadji

EliasLumer avatar Aug 19 '25 12:08 EliasLumer

@aksg87 I have the openai plugin up and running (covers openai + azureopenai): https://pypi.org/project/langextract-openai/ https://github.com/JustStas/langextract-openai

Please tell me if there is anything there that is not in line with what you expect community plugins to look like.

I can try to make the LiteLLM work as well - will report back on that

JustStas avatar Aug 19 '25 13:08 JustStas

@aksg87 I have the openai plugin up and running: https://pypi.org/project/langextract-openai/ https://github.com/JustStas/langextract-openai

Does this mean I can now comfortably serve any LLMs from Huggingface using vLLM using this openai plugin?

torchss avatar Aug 19 '25 13:08 torchss

@aksg87 Also created a plugin for litellm - https://github.com/JustStas/langextract-litellm @torchss I would advise to test this one out - it should work with huggingface models provided they are covered by LiteLLM @EliasLumer please check it out and share feedback :)

JustStas avatar Aug 19 '25 16:08 JustStas

@JustStas This is phenomenal!

If I use vLLM, SGLANG and llama.cpp to serve OpenAI-compatible API endpoints, are you suggesting I try out your litellm over your openai ? If so:

  1. Why? (For example: Are you passing through "more options" to litellm over openai and your openai is literally for openai and not just OpenAI-compatible API servers?)
  2. I will be using the structured generation backends in vLLM, SGLANG which are both custom options to the LLM - would your answer change?

Again, Thank You for doing all this!

torchss avatar Aug 19 '25 17:08 torchss

@torchss

  • the openai plugin is indeed designed only for openai + azure openai
  • tbh, if the litellm one works well, I don't see any reason to use the separate openai one (maybe it will be worth deprecating it to avoid multiple entities to maintain). I would expect the functionality and performance to be identical. IMHO, the litellm plugin could be included in the main library but that's above my paygrade and for @aksg87 to decide :)

JustStas avatar Aug 19 '25 22:08 JustStas

@aksg87 would it make sense to feature a list of community plugins on the core Readme?

JustStas avatar Aug 22 '25 09:08 JustStas

Hi @JustStas, you read my mind :)

I’m working on a template and setup for collecting community plugins that will be linked in the README. I’ll try to set that up very soon, and excited to centralize this in an organized way.

aksg87 avatar Aug 22 '25 13:08 aksg87

@torchss

  • the openai plugin is indeed designed only for openai + azure openai
  • tbh, if the litellm one works well, I don't see any reason to use the separate openai one (maybe it will be worth deprecating it to avoid multiple entities to maintain). I would expect the functionality and performance to be identical. IMHO, the litellm plugin could be included in the main library but that's above my paygrade and for @aksg87 to decide :)

Hey, will definitely look into this soon! Thanks for the suggestions

aksg87 avatar Aug 22 '25 13:08 aksg87

@aksg87 thanks for adding support for Azure OpenAI. When i run with this config:

Extract with Azure OpenAI

result = lx.extract( text_or_documents=input_text, model_id=deployment_id, api_key=api_key, azure_endpoint=azure_endpoint, prompt_description=prompt, )

receiving this error? TypeError: extract() got an unexpected keyword argument 'azure_endpoint' All variables are defined and pulling from env variables.

Beenjamming avatar Aug 22 '25 22:08 Beenjamming

@Beenjamming

  1. I believe if you have queries regarding one of the community plugins it makes sense to ask them on the repo of the respective plugin, not here.
  2. Please check the examples and documentation in either the openai or litellm plugin - both should enable you to use Azure OpenAI models. You will also notice that both of them differ a different format from what you inputted.

JustStas avatar Aug 23 '25 18:08 JustStas

@aksg87 would it make sense to feature a list of community plugins on the core Readme?

This is done now in #182, please take a look and let me know if you think the table and documentation works well for adding references to community plugins. Also, please feel free to add yours to the table. Thanks! @JustStas

aksg87 avatar Aug 23 '25 23:08 aksg87

@Beenjamming

  1. I believe if you have queries regarding one of the community plugins it makes sense to ask them on the repo of the respective plugin, not here.
  2. Please check the examples and documentation in either the openai or litellm plugin - both should enable you to use Azure OpenAI models. You will also notice that both of them differ a different format from what you inputted.

Thank you will do!

Beenjamming avatar Aug 24 '25 15:08 Beenjamming

@aksg87 added in #186 I suppose it doesn't make sense to add providers that duplicate the functionality from LiteLLM to avoid user confusion? I think I will skip adding my openai provider there for now.

JustStas avatar Aug 24 '25 18:08 JustStas

https://github.com/JustStas/langextract-litellm/issues/1 Thanks! See my response here

On Tue, Aug 19, 2025 at 12:44 PM JustStas @.***> wrote:

JustStas left a comment (google/langextract#99) https://github.com/google/langextract/issues/99#issuecomment-3201466207

@aksg87 https://github.com/aksg87 Also created a plugin for litellm - https://github.com/JustStas/langextract-litellm @torchss https://github.com/torchss I would advise to test this one out

— Reply to this email directly, view it on GitHub https://github.com/google/langextract/issues/99#issuecomment-3201466207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVG4E4BKNVHXZ4PS3KL3WOL3ONH6VAVCNFSM6AAAAACDNWBNW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMBRGQ3DMMRQG4 . You are receiving this because you were mentioned.Message ID: @.***>

EliasLumer avatar Aug 25 '25 11:08 EliasLumer