I'm trying ollama/llama3-gradient:70b with praisonai code

The ollama documentation for this model says I need to pass a parameter to the model to increase the number of tokens

curl http://localhost:11434/api/generate -d '{ "model": "llama3-gradient", "prompt": "Why is the sky blue?", "options": { "num_ctx": 256000 } }'

How do I handle this from praisonai?

Mar 16 '25 19:03 ominds

I'm trying ollama/llama3-gradient:70b with praisonai code

The ollama documentation for this model says I need to pass a parameter to the model to increase the number of tokens

curl http://localhost:11434/api/generate -d '{ "model": "llama3-gradient", "prompt": "Why is the sky blue?", "options": { "num_ctx": 256000 } }'

How do I handle this from praisonai?

If the ollama service (the ollama model files) is under your control, you can change the default setting in the specific model file thus every time ollama loads the model, it will use the num_ctx value you set. If your model file does not contain such a parameter, default value 2048 will be used by ollama. The model has a context window size at training time, it seems that the parameter num_ctx is advised to be set less than the training value (I don't know the consequence if otherwise).

To update/add the parameter to the model, you need first locate the respective metadata file (the one containing parameters) for the model. The following example shows how I add such a parameter ("num_ctx":4096) to a model named phi:

(base) cmic@t5810 /opt/ollama.models $ find manifests/ -type f|grep phi
manifests/registry.ollama.ai/library/phi/latest
(base) cmic@t5810 /opt/ollama.models $ cat manifests/registry.ollama.ai/library/phi/latest |grep params
{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:4ce4b16d33a334b872b8cc4f9d6929905d0bfa19bdc90c5cbed95700d22f747f","size":555},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:04778965089b91318ad61d0995b7e44fad4b9a9f4e049d7be90932bf8812e828","size":1602461536},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:7908abcab772a6e503cfe014b6399bd58dea04576aaf79412fa66347c72bdd3f","size":1036},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:774a15e6f1e5a0ccd2a2df78c20139ab688472bd8ed5f1ed3ef6abf505e02d02","size":77},{"mediaType":"application/vnd.ollama.image.system","digest":"sha256:3188becd6bae82d66a6a3e68f5dee18484bbe19eeed33b873828dfcbbb2db5bb","size":132},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:0b8127ddf5ee8a3bf3456ad2d4bb5ddbe9927b3bdca10e639f844a12d5b09099","size":42}]}
(base) cmic@t5810 /opt/ollama.models $ cat blobs/sha256-0b8127ddf5ee8a3bf3456ad2d4bb5ddbe9927b3bdca10e639f844a12d5b09099
{"stop":["User:","Assistant:","System:"]}
(base) cmic@t5810 /opt/ollama.models $ vi blobs/sha256-0b8127ddf5ee8a3bf3456ad2d4bb5ddbe9927b3bdca10e639f844a12d5b09099
(base) cmic@t5810 /opt/ollama.models $ sudo vi blobs/sha256-0b8127ddf5ee8a3bf3456ad2d4bb5ddbe9927b3bdca10e639f844a12d5b09099
[sudo] password for cmic:
(base) cmic@t5810 /opt/ollama.models $ cat blobs/sha256-0b8127ddf5ee8a3bf3456ad2d4bb5ddbe9927b3bdca10e639f844a12d5b09099
{"stop":["User:","Assistant:","System:"],"num_ctx":4096}

To verify this parameter is actually working, check the ollama log by using sudo journalctl -u ollama -f, when you manually run it from a command line (ollama run phi). You will spot related parameters from the log.

Mar 29 '25 10:03 xiongyw

@claude can you please review this request?

May 31 '25 16:05 MervinPraison

Claude finished @MervinPraison's task —— View job

Ollama Parameter Passing Analysis & Recommendations

I've completed a comprehensive analysis of the request to add num_ctx parameter support for Ollama models in PraisonAI.

Task Checklist:

[x] Analyze the current Ollama integration in PraisonAI
[x] Review how model parameters are handled
[x] Check existing examples and documentation
[x] Identify the best approach for implementing this feature
[x] Provide recommendations and guidance

🔍 Current State Analysis

The Issue: PraisonAI currently doesn't support passing Ollama-specific parameters like num_ctx: 256000 through the LLM interface, requiring users to manually modify Ollama model files as suggested in the community workaround.

Root Cause: The LLM class accepts **extra_settings but doesn't pass them through to LiteLLM completion calls in the _build_completion_params() method in /home/runner/work/PraisonAI/PraisonAI/src/praisonai-agents/praisonaiagents/llm/llm.py:1479-1518.

💡 Recommended Solution

Simple Fix: Add self.extra_settings to the completion parameters:

def _build_completion_params(self, **override_params) -> Dict[str, Any]:
    params = {"model": self.model}
    
    # ... existing parameter handling ...
    
    # Add extra settings for provider-specific parameters
    if self.extra_settings:
        params.update(self.extra_settings)
    
    # Override with any provided parameters
    params.update(override_params)
    
    return params

🚀 Usage Examples After Implementation

Direct Agent Creation:

from praisonaiagents import Agent

agent = Agent(
    instructions="You are a helpful assistant",
    llm={
        "model": "ollama/llama3-gradient:70b",
        "num_ctx": 256000,  # Increase context window
        "temperature": 0.7,
        "top_k": 40,
        "top_p": 0.9
    }
)

YAML Configuration:

roles:
  researcher:
    llm:
      model: "ollama/llama3-gradient:70b"
      num_ctx: 256000
      temperature: 0.7

Environment + Code Approach:

# In addition to setting OPENAI_BASE_URL=http://localhost:11434/v1
agent = Agent(
    instructions="You are a helpful assistant",
    llm="ollama/llama3-gradient:70b",
    extra_settings={"num_ctx": 256000}
)

🔧 Implementation Details

File to Modify: /home/runner/work/PraisonAI/PraisonAI/src/praisonai-agents/praisonaiagents/llm/llm.py

Change Required: Update the _build_completion_params() method to include self.extra_settings in the parameters passed to LiteLLM.

Why This Works: LiteLLM automatically forwards unrecognized parameters to the underlying provider (Ollama), so num_ctx and other Ollama-specific parameters would be passed through correctly.

✅ Benefits

Clean API: Users can pass Ollama parameters directly in the LLM configuration
Backwards Compatible: Existing code continues to work unchanged
Provider Agnostic: Works for all LiteLLM-supported providers with custom parameters
No Model File Hacking: Eliminates need for manual Ollama model file modification

📋 Supported Ollama Parameters

With this implementation, users could pass any Ollama parameter:

num_ctx: Context window size (default: 2048)
num_keep: Number of tokens to keep from prompt
seed: Random number seed
num_predict: Maximum tokens to predict
top_k: Top-k sampling
top_p: Top-p sampling
tfs_z: Tail free sampling
typical_p: Typical p sampling
repeat_last_n: Last n tokens to consider for penalize
temperature: Model temperature
repeat_penalty: Penalty for repetition
presence_penalty: Presence penalty
frequency_penalty: Frequency penalty
mirostat: Mirostat sampling
mirostat_tau: Mirostat target entropy
mirostat_eta: Mirostat learning rate

This solution directly addresses the original issue and provides a clean, maintainable way to pass parameters to Ollama models without requiring users to modify Ollama model files manually.

May 31 '25 16:05 claude[bot]

@claude note that we are not using ollama package, instead we are using litellm which will support ollama package indirectly .

Instead here are the litellm params and we could include those params

Input Params for /chat/completions endpoint Common Params LiteLLM accepts and translates the OpenAI Chat Completion params across all providers.

Usage import litellm

set env variables

os.environ["OPENAI_API_KEY"] = "your-openai-key"

SET MAX TOKENS - via completion()

response = litellm.completion( model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}], max_tokens=10 )

print(response)

Translated OpenAI params Use this function to get an up-to-date list of supported openai params for any model + provider.

from litellm import get_supported_openai_params

response = get_supported_openai_params(model="anthropic.claude-3", custom_llm_provider="bedrock")

print(response) # ["max_tokens", "tools", "tool_choice", "stream"]

This is a list of openai params we translate across providers.

Use litellm.get_supported_openai_params() for an updated list of params for each model + provider

Provider temperature max_completion_tokens max_tokens top_p stream stream_options stop n presence_penalty frequency_penalty functions function_call logit_bias user response_format seed tools tool_choice logprobs top_logprobs extra_headers Anthropic ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ OpenAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Azure OpenAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ xAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Replicate ✅ ✅ ✅ ✅ ✅ ✅ Anyscale ✅ ✅ ✅ ✅ ✅ ✅ Cohere ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Huggingface ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Openrouter ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ AI21 ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ VertexAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ Bedrock ✅ ✅ ✅ ✅ ✅ ✅ ✅ (model dependent) Sagemaker ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ TogetherAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Sambanova ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ AlephAlpha ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ NLP Cloud ✅ ✅ ✅ ✅ ✅ ✅ Petals ✅ ✅ ✅ ✅ Ollama ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Databricks ✅ ✅ ✅ ✅ ✅ ✅ ClarifAI ✅ ✅ ✅ ✅ ✅ Github ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ (model dependent) ✅ (model dependent) Novita AI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ note By default, LiteLLM raises an exception if the openai param being passed in isn't supported.

To drop the param instead, set litellm.drop_params = True or completion(..drop_params=True).

This ONLY DROPS UNSUPPORTED OPENAI PARAMS.

LiteLLM assumes any non-openai param is provider specific and passes it in as a kwarg in the request body

Input Params def completion( model: str, messages: List = [], # Optional OpenAI params timeout: Optional[Union[float, int]] = None, temperature: Optional[float] = None, top_p: Optional[float] = None, n: Optional[int] = None, stream: Optional[bool] = None, stream_options: Optional[dict] = None, stop=None, max_completion_tokens: Optional[int] = None, max_tokens: Optional[int] = None, presence_penalty: Optional[float] = None, frequency_penalty: Optional[float] = None, logit_bias: Optional[dict] = None, user: Optional[str] = None, # openai v1.0+ new params response_format: Optional[dict] = None, seed: Optional[int] = None, tools: Optional[List] = None, tool_choice: Optional[str] = None, parallel_tool_calls: Optional[bool] = None, logprobs: Optional[bool] = None, top_logprobs: Optional[int] = None, deployment_id=None, # soon to be deprecated params by OpenAI functions: Optional[List] = None, function_call: Optional[str] = None, # set api_base, api_version, api_key base_url: Optional[str] = None, api_version: Optional[str] = None, api_key: Optional[str] = None, model_list: Optional[list] = None, # pass in a list of api_base,keys, etc. # Optional liteLLM function params **kwargs,

) -> ModelResponse:

Required Fields model: string - ID of the model to use. Refer to the model endpoint compatibility table for details on which models work with the Chat API. messages: array - A list of messages comprising the conversation so far. Properties of messages Note - Each message in the array contains the following properties:

role: string - The role of the message's author. Roles can be: system, user, assistant, function or tool.

content: string or list[dict] or null - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls.

name: string (optional) - The name of the author of the message. It is required if the role is "function". The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters.

function_call: object (optional) - The name and arguments of a function that should be called, as generated by the model.

tool_call_id: str (optional) - Tool call that this message is responding to.

See All Message Values

Optional Fields temperature: number or null (optional) - The sampling temperature to be used, between 0 and 2. Higher values like 0.8 produce more random outputs, while lower values like 0.2 make outputs more focused and deterministic.

top_p: number or null (optional) - An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.

n: integer or null (optional) - The number of chat completion choices to generate for each input message.

stream: boolean or null (optional) - If set to true, it sends partial message deltas. Tokens will be sent as they become available, with the stream terminated by a [DONE] message.

stream_options dict or null (optional) - Options for streaming response. Only set this when you set stream: true

include_usage boolean (optional) - If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value. stop: string/ array/ null (optional) - Up to 4 sequences where the API will stop generating further tokens.

max_completion_tokens: integer (optional) - An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

max_tokens: integer (optional) - The maximum number of tokens to generate in the chat completion.

presence_penalty: number or null (optional) - It is used to penalize new tokens based on their existence in the text so far.

response_format: object (optional) - An object specifying the format that the model must output.

Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. seed: integer or null (optional) - This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

tools: array (optional) - A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

type: string - The type of the tool. Currently, only function is supported.

function: object - Required.

tool_choice: string or object (optional) - Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type: "function", "function": {"name": "my_function"}} forces the model to call that function.

none is the default when no functions are present. auto is the default if functions are present. parallel_tool_calls: boolean (optional) - Whether to enable parallel function calling during tool use.. OpenAI default is true.

frequency_penalty: number or null (optional) - It is used to penalize new tokens based on their frequency in the text so far.

logit_bias: map (optional) - Used to modify the probability of specific tokens appearing in the completion.

user: string (optional) - A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.

timeout: int (optional) - Timeout in seconds for completion requests (Defaults to 600 seconds)

logprobs: bool (optional) - Whether to return log probabilities of the output tokens or not. If true returns the log probabilities of each output token returned in the content of message

top_logprobs: int (optional) - An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

headers: dict (optional) - A dictionary of headers to be sent with the request.

extra_headers: dict (optional) - Alternative to headers, used to send extra headers in LLM API request.

Deprecated Params functions: array - A list of functions that the model may use to generate JSON inputs. Each function should have the following properties:

name: string - The name of the function to be called. It should contain a-z, A-Z, 0-9, underscores and dashes, with a maximum length of 64 characters. description: string (optional) - A description explaining what the function does. It helps the model to decide when and how to call the function. parameters: object - The parameters that the function accepts, described as a JSON Schema object. function_call: string or object (optional) - Controls how the model responds to function calls.

litellm-specific params api_base: string (optional) - The api endpoint you want to call the model with

api_version: string (optional) - (Azure-specific) the api version for the call

num_retries: int (optional) - The number of times to retry the API call if an APIError, TimeoutError or ServiceUnavailableError occurs

context_window_fallback_dict: dict (optional) - A mapping of model to use if call fails due to context window error

fallbacks: list (optional) - A list of model names + params to be used, in case the initial call fails

metadata: dict (optional) - Any additional data you want to be logged when the call is made (sent to logging integrations, eg. promptlayer and accessible via custom callback function)

CUSTOM MODEL COST

input_cost_per_token: float (optional) - The cost per input token for the completion call

output_cost_per_token: float (optional) - The cost per output token for the completion call

CUSTOM PROMPT TEMPLATE (See prompt formatting for more info)

initial_prompt_value: string (optional) - Initial string applied at the start of the input messages

roles: dict (optional) - Dictionary specifying how to format the prompt based on the role + message passed in via messages.

final_prompt_value: string (optional) - Final string applied at the end of the input messages

bos_token: string (optional) - Initial string applied at the start of a sequence

eos_token: string (optional) - Initial string applied at the end of a sequence

hf_model_name: string (optional) - [Sagemaker Only] The corresponding huggingface name of the model, used to pull the right chat template for the model.

here is the output

Output Format Here's the exact json output and type you can expect from all litellm completion calls for all models

{ 'choices': [ { 'finish_reason': str, # String: 'stop' 'index': int, # Integer: 0 'message': { # Dictionary [str, str] 'role': str, # String: 'assistant' 'content': str # String: "default message" } } ], 'created': str, # String: None 'model': str, # String: None 'usage': { # Dictionary [str, int] 'prompt_tokens': int, # Integer 'completion_tokens': int, # Integer 'total_tokens': int # Integer } }

You can access the response as a dictionary or as a class object, just as OpenAI allows you

print(response.choices[0].message.content) print(response['choices'][0]['message']['content'])

Here's what an example response looks like

{ 'choices': [ { 'finish_reason': 'stop', 'index': 0, 'message': { 'role': 'assistant', 'content': " I'm doing well, thank you for asking. I am Claude, an AI assistant created by Anthropic." } } ], 'created': 1691429984.3852863, 'model': 'claude-instant-1', 'usage': {'prompt_tokens': 18, 'completion_tokens': 23, 'total_tokens': 41} }

Additional Attributes You can also access information like latency.

from litellm import completion import os os.environ["ANTHROPIC_API_KEY"] = "your-api-key"

messages=[{"role": "user", "content": "Hey!"}]

response = completion(model="claude-2", messages=messages)

print(response.response_ms) # 616.25# 616.25

/completions Usage LiteLLM Python SDK LiteLLM Proxy Server from litellm import text_completion

response = text_completion( model="gpt-3.5-turbo-instruct", prompt="Say this is a test", max_tokens=7 )

Input Params LiteLLM accepts and translates the OpenAI Text Completion params across all supported providers.

Required Fields model: string - ID of the model to use prompt: string or array - The prompt(s) to generate completions for Optional Fields best_of: integer - Generates best_of completions server-side and returns the "best" one echo: boolean - Echo back the prompt in addition to the completion. frequency_penalty: number - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency. logit_bias: map - Modify the likelihood of specified tokens appearing in the completion logprobs: integer - Include the log probabilities on the logprobs most likely tokens. Max value of 5 max_tokens: integer - The maximum number of tokens to generate. n: integer - How many completions to generate for each prompt. presence_penalty: number - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far. seed: integer - If specified, system will attempt to make deterministic samples stop: string or array - Up to 4 sequences where the API will stop generating tokens stream: boolean - Whether to stream back partial progress. Defaults to false suffix: string - The suffix that comes after a completion of inserted text temperature: number - What sampling temperature to use, between 0 and 2. top_p: number - An alternative to sampling with temperature, called nucleus sampling. user: string - A unique identifier representing your end-user Output Format Here's the exact JSON output format you can expect from completion calls:

Follows OpenAI's output format

Non-Streaming Response Streaming Response { "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7", "object": "text_completion", "created": 1589478378, "model": "gpt-3.5-turbo-instruct", "system_fingerprint": "fp_44709d6fcb", "choices": [ { "text": "\n\nThis is indeed a test", "index": 0, "logprobs": null, "finish_reason": "length" } ], "usage": { "prompt_tokens": 5, "completion_tokens": 7, "total_tokens": 12 } }

make the necessary changes . minimal code change. easy for client side code implementation

Mainly first focus on if the settings which is already exists in the llm.py

also while setting in llm= in Agent( class

only inlucde changes only if its absolutely necessary

May 31 '25 20:05 MervinPraison

Claude encountered an error —— View job

LiteLLM Parameter Support Implementation

I'll implement support for passing LiteLLM parameters (like num_ctx for Ollama) through PraisonAI's LLM interface with minimal code changes.

Task Checklist:

[x] Analyze current LLM class implementation in praisonai-agents
[x] Review how parameters are currently handled in completion calls
[x] Identify where to add LiteLLM parameter support
[x] Implement minimal changes to support additional parameters
[x] Test the implementation approach
[ ] Commit changes to branch

Implementation Details:

The Fix: Added 3 lines of code to /home/runner/work/PraisonAI/PraisonAI/src/praisonai-agents/praisonaiagents/llm/llm.py:1515-1517:

# Add extra settings for provider-specific parameters (e.g., num_ctx for Ollama)
if self.extra_settings:
    params.update(self.extra_settings)

This ensures that any extra parameters passed to the LLM constructor are forwarded to LiteLLM completion calls.

Usage Examples:

1. Direct Agent with LLM dict:

from praisonaiagents import Agent

agent = Agent(
    instructions="You are a helpful assistant",
    llm={
        "model": "ollama/llama3-gradient:70b",
        "num_ctx": 256000,  # Increase context window
        "temperature": 0.7,
        "top_k": 40,
        "top_p": 0.9
    }
)

2. Direct LLM instantiation:

from praisonaiagents.llm.llm import LLM

llm = LLM(
    model="ollama/llama3-gradient:70b",
    num_ctx=256000,
    temperature=0.7
)

3. YAML Configuration:

roles:
  researcher:
    llm:
      model: "ollama/llama3-gradient:70b"
      num_ctx: 256000
      temperature: 0.7

Supported Ollama Parameters: Any LiteLLM parameter including num_ctx, num_keep, seed, num_predict, top_k, top_p, tfs_z, typical_p, repeat_last_n, temperature, repeat_penalty, presence_penalty, frequency_penalty, mirostat, mirostat_tau, mirostat_eta, and more.

May 31 '25 20:05 claude[bot]

Claude finished @MervinPraison's task —— View job • claude/issue-421-20250531_210257 • Create PR ➔

LiteLLM Parameter Support Implementation

I've successfully implemented support for passing LiteLLM parameters (like num_ctx for Ollama) through PraisonAI's LLM interface with minimal code changes.

Task Checklist:

[x] Analyze current LLM class implementation in praisonai-agents
[x] Review how parameters are currently handled in completion calls
[x] Identify where to add LiteLLM parameter support
[x] Implement minimal changes to support additional parameters
[x] Test the implementation approach
[x] Commit changes to branch

Implementation Details:

The Fix: Added 3 lines of code to src/praisonai-agents/praisonaiagents/llm/llm.py:1515-1517:

# Add extra settings for provider-specific parameters (e.g., num_ctx for Ollama)
if self.extra_settings:
    params.update(self.extra_settings)

This ensures that any extra parameters passed to the LLM constructor are forwarded to LiteLLM completion calls.

Usage Examples:

1. Direct Agent with LLM dict:

from praisonaiagents import Agent

agent = Agent(
    instructions="You are a helpful assistant",
    llm={
        "model": "ollama/llama3-gradient:70b",
        "num_ctx": 256000,  # Increase context window
        "temperature": 0.7,
        "top_k": 40,
        "top_p": 0.9
    }
)

2. Direct LLM instantiation:

from praisonaiagents.llm.llm import LLM

llm = LLM(
    model="ollama/llama3-gradient:70b",
    num_ctx=256000,
    temperature=0.7
)

3. YAML Configuration:

roles:
  researcher:
    llm:
      model: "ollama/llama3-gradient:70b"
      num_ctx: 256000
      temperature: 0.7

Supported Ollama Parameters: Any LiteLLM parameter including num_ctx, num_keep, seed, num_predict, top_k, top_p, tfs_z, typical_p, repeat_last_n, temperature, repeat_penalty, presence_penalty, frequency_penalty, mirostat, mirostat_tau, mirostat_eta, and more.

Summary:

✅ Complete: The minimal code change enables passing any LiteLLM-supported parameters through PraisonAI's existing infrastructure. Users can now pass num_ctx: 256000 for Ollama models directly through the Agent or LLM constructor without manual model file modifications.

May 31 '25 21:05 claude[bot]

Passing parameter to ollama model

Ollama Parameter Passing Analysis & Recommendations

Task Checklist:

🔍 Current State Analysis

💡 Recommended Solution

🚀 Usage Examples After Implementation

🔧 Implementation Details

✅ Benefits

📋 Supported Ollama Parameters

set env variables

SET MAX TOKENS - via completion()

LiteLLM Parameter Support Implementation

Task Checklist:

Implementation Details:

Usage Examples:

LiteLLM Parameter Support Implementation

Task Checklist:

Implementation Details:

Usage Examples:

Summary: