langchainrb
langchainrb copied to clipboard
Ollama improvements
Temperature and seed parameters should be part of 'options'
According to the docs temperature and seed should be passed as options:
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"options": {
"seed": 101,
"temperature": 0
}
}'
In the current implementation these are passed at the same level as parameters like 'model'.
Changing code of Langchain::LLM::Ollama like this works, but is probably not the best place to implement this.
def chat(messages:, model: nil, **params, &block)
parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))
if parameters.key?(:seed) || parameters.key?(:temperature)
parameters[:options] = {}
if parameters.key?(:seed)
parameters[:options][:seed] = parameters.delete(:seed)
end
if parameters.key?(:temperature)
parameters[:options][:temperature] = parameters.delete(:temperature)
end
end
# ...
Non-streaming response chunks should be joined before parsing?
I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to chat
method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.
In the current implementation each chunk is JSON.parse
'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.
Changing code of Langchain::LLM::Ollama like this works for me.
def chat(messages:, model: nil, **params, &block)
parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))
responses_stream = []
if parameters[:stream]
# Existing code
client.post("api/chat", parameters) do |req|
req.options.on_data = json_responses_chunk_handler do |parsed_chunk|
responses_stream << parsed_chunk
block&.call(OllamaResponse.new(parsed_chunk, model: parameters[:model]))
end
end
generate_final_chat_completion_response(responses_stream, parameters)
# /Existing code
else
client.post("api/chat", parameters) do |req|
req.options.on_data = proc do |chunk, _size, _env|
puts "RECEIVED #{_size} CHARS, LAST CHAR IS: '#{chunk[-1]}'" # DEBUG
responses_stream << chunk
end
end
OllamaResponse.new(
{
"message" => {
"role" => "assistant",
"content" => JSON.parse(responses_stream.join).dig("message", "content")
}
},
model: parameters[:model]
)
end
end
Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature. This happens at least with llama3-8b-q8 and phi3-14b-q5 models. Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.
Inherit from Langchain::LLM::OpenAI ?
Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.