dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Feature mistralai api

Open fsndzomga opened this issue 11 months ago • 6 comments

I added support for Mistral AI's API. Thanks to this new feature, users will be able to play with the following models easily: mistral-small-latest, mistral-medium-latest, and mistral-large-latest (a GPT-4 level model).

Here is an example of usage:


import dspy

lm = dspy.Mistral(api_key="your-mistralai-api-key")

dspy.settings.configure(lm=lm)

mod = dspy.Predict("question -> answer")

print(mod(question="Who is Emmanuel Macron?"))

fsndzomga avatar Mar 08 '24 05:03 fsndzomga

It'd be great if you could also add md file to this folder. https://github.com/stanfordnlp/dspy/tree/main/docs/api/language_model_clients

insop avatar Mar 08 '24 06:03 insop

Sure I will do that ! Thanks for the feedback !

fsndzomga avatar Mar 08 '24 06:03 fsndzomga

Done @insop !

fsndzomga avatar Mar 08 '24 07:03 fsndzomga

Thank you @fsndzomga . Python code looks good to me; however I am not so sure about other files.

insop avatar Mar 08 '24 07:03 insop

@fsndzomga can you change the link in docs to api/language_model_clients/Mistral

krypticmouse avatar Mar 08 '24 19:03 krypticmouse

@krypticmouse , I am not sure I understand. Isn't it already the case ? Screenshot 2024-03-08 at 23 22 07

fsndzomga avatar Mar 08 '24 22:03 fsndzomga

@fsndzomga yes but you just need to specify the path and remove the base url from the link. So you can remove the https://dspy-docs.vercel.app

krypticmouse avatar Mar 09 '24 05:03 krypticmouse

Okay, I get it. Done ! Thanks for the feedback.

fsndzomga avatar Mar 09 '24 11:03 fsndzomga

Hi @fsndzomga , thanks for the addition!

We want to keep all external imports/dependencies local to the file requirements.

arnavsinghvi11 avatar Mar 09 '24 18:03 arnavsinghvi11

@arnavsinghvi11 @okhat , I took into account all the feedbacks. It should be all good now.

fsndzomga avatar Mar 09 '24 21:03 fsndzomga

Tested, I'm not sure if it really works. Mistral API models are chat completion models, and so they respond with repeating the task when using Predict (but not Chain of Thought which gives good results)

sentence = "it's a charming and often affecting journey."  # example from the SST-2 dataset.

# 1) Declare with a signature.
classify = dspy.Predict('sentence -> sentiment')

# 2) Call with input argument(s). 
response = classify(sentence=sentence)

# 3) Access the output.
print(response.sentiment)

"Sentence: it's a charming and often affecting journey.\nSentiment: Positive\n\nNote: The sentiment is inferred as positive based on the use of words such as 'charming' and 'affecting' which convey a positive emotion or experience. However, please note that sentiment analysis is not always 100% accurate and can be subjective."

axelpey avatar Mar 24 '24 15:03 axelpey

Thanks for the feedback @axelpey. To be clear, what you are saying is that it works, but the format of the response is sometimes not the one you expect. Right? Mistral models have a tendency to change their behavior, as you have mentioned. For instance, what you mentioned has not been happening over the past few weeks when I was testing. It depends on the model you use; even the same model can provide responses in different formats. I tested your code and got this: "Sentiment: Positive" which is more concise.

Screenshot 2024-03-24 at 17 02 36

Since DSPy does not handle response formats per se, on way to achieve your desired goal is to use TypedPredictors or simply use a mistral-large for example.

A global solution would be to add a custom parser specifically for Mistral AI, and I am not sure if it is in line with DSPy approach.

fsndzomga avatar Mar 24 '24 16:03 fsndzomga

Hmm which one did you use? I used medium. And I think this problem is not limited to mistral, it's for all models that are ChatCompletion models and not just standard Completion. I think it's still very relevant to have Mistral API avail in dspy, just we need to be mindful of it. @okhat I think you've seen these issues before with other ChatCompletion too right?

axelpey avatar Mar 24 '24 16:03 axelpey

Maybe I am missing something, but we do use the chat completions API from OpenAI right ? Below a screenshot of the gpt3.py file. Screenshot 2024-03-24 at 20 00 20

Anyway, if you guys have a better idea how to handle it, let me know, or we can just drop this feature.

fsndzomga avatar Mar 24 '24 19:03 fsndzomga

@fsndzomga @axelpey Thanks for the discussion on the limitation with ChatCompletion models. This is indeed something we've seen come across with running chat models with DSPy. There is some more discussion in #662 and #420 as we have some ongoing work to handle this, but this PR is good to merge for now! Thanks @fsndzomga for the contributions!

arnavsinghvi11 avatar Mar 24 '24 19:03 arnavsinghvi11

Oh so even the OpenAI integration has the same problem... I really wasn't aware of that. Thanks for sharing the links @arnavsinghvi11 and thanks for merging the PR !

fsndzomga avatar Mar 24 '24 19:03 fsndzomga