kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

feat(datasets): Add limited `langchain` support for Anthropic, Cohere, and OpenAI models

Open ianwhale opened this issue 2 years ago • 3 comments

Description

Adds limited support for langchain models.

This PR is a rough starting point for loading langchain API-based models.

The big issue here is langchain's model catalog. See the list here (just for chat models).

There's no way anyone could implement and maintain all of these.

Even if that was desirable, we can see from the CohereDataset example that there are going to be lots of details along the way that will make this task difficult.

Would love to see what the team thinks and if this is worth pushing forward!

Development notes

Adds four datasets for interacting with langchain models.

Checklist

  • [x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • [ ] Updated the documentation to reflect the code changes
  • [ ] Added a description of this change in the relevant RELEASE.md file
  • [ ] Added tests to cover my changes

ianwhale avatar Nov 16 '23 22:11 ianwhale

Hey again @astrojuanlu! Excuse my slow reply, I was out for thanksgiving.

I did use this in a PoC. However, I only ever used the YAML api.

I'll push some python API examples.

ianwhale avatar Nov 27 '23 15:11 ianwhale

Depending on #629, this may need to move to a contribution folder but I think this is mostly ready

noklam avatar Apr 30 '24 19:04 noklam

Hi @ianwhale, thanks so much for your patience with this PR! We're about to launch our new experimental dataset contribution model, which basically means you can contribute datasets that are more experimental and don't have to have full test coverage etc here https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets_experimental.

I think this PR with datasets would be a perfect first candidate to go into kedro_datasets_experimental. I don't think there's much else you need to do, other than move it to that directory.

merelcht avatar May 15 '24 16:05 merelcht

A couple of thoughts that relate to the topic and we can consider them in future:

  • When we worked with the langchain we found it convenient to work with chains - that combine llm and promt and provide a standardised interface to call the model with run-time parameters (prompt placeholders). So one can use different llms with the same interface. Example using latest langchain API: https://python.langchain.com/v0.1/docs/integrations/chat/anthropic/
  • We also came to dynamic model initialisation, in our case it can help users to switch between different models without need to add extra datasets (OpenAI, Cohere, Azure, etc) with just one LangchainDataset.

Thanks @ElenaKhaustova for reviewing! I really like your ideas to improve this. I'd suggest merging this version for now and when we have some time or someone from the community can help out we can implement the improvements.

merelcht avatar Jun 03 '24 09:06 merelcht

Docs don't show up though 😬 https://docs.kedro.org/projects/kedro-datasets/en/latest/api/kedro_datasets_experimental.html

astrojuanlu avatar Jun 03 '24 11:06 astrojuanlu

I was still planning on polishing before merging, but then it was already merged. Maybe let the assignee/author complete it next time instead of merging as reviewer?

merelcht avatar Jun 03 '24 12:06 merelcht

Maybe let the assignee/author complete it next time instead of merging as reviewer?

👍🏼

astrojuanlu avatar Jun 03 '24 13:06 astrojuanlu