HC Heine

Results 21 comments of HC Heine

With hybrid search one would expect [RerankCompressor ](https://github.com/open-webui/open-webui/blob/90503be2edef1a1f7ce2074286b6316d5cb8868a/backend/apps/rag/utils.py#L78) being involved and then being logged with the `query_doc_with_hybrid_search:result` I will try again in the Win10 setup, as for the logs. We...

What I originally meant is that (before we got the side-by-side, but still with top cutoff), the vertical cutoff is not ideal: ![image](https://github.com/open-webui/open-webui/assets/27736055/c27cc97f-2d29-4e86-8e3f-7e68604c0e6b) Now with side-by-side I wish there would...

For me it breaks in version 0.3.16 still when models are using $$ without newlines. And if it uses align or equation environments, it would break also. ![Screenshot_20240901_020430_Vivaldi](https://github.com/user-attachments/assets/d6b2ebe9-3bad-4548-846f-3fd95663a00c) If told...

I have read that: https://github.com/codelion/optillm/blob/193ab3c4d54f5f2e2c47525293bd7827b609675f/README.md?plain=1#L51 but it seems not all supported although ollama?

seems like it will not produce 3 completions .. ``` def mixture_of_agents(system_prompt: str, initial_query: str, client, model: str) -> str: moa_completion_tokens = 0 completions = [] response = client.chat.completions.create( model=model,...

Yes, i added that in the same way for my run. BTW: https://github.com/codelion/optillm/blob/41821aacdc70b9c6b65f4663eabbf3aa230cd37d/optillm/bon.py#L30 Here my LLM will not answer only with a number .. I guess the user instruction is...

The current workaround involves not defining the context length in Open WebUI via parameters (thus keeping it as default), but instead specifying it in the model's blob config file. However,...

> I found a workaround! :) You need to create your own modelfile using the Admin -> Models and then use it for chat and Title generation: > > Deepseek-v2-4096...

Can something be improved with long chats (happens faster with more models side-by-side) with the lag? After 30k with three models, it is nearly impossible to continue.

Is there any update? With 0.3.0 I am still on: ``` offloading 79 repeating layers to GPU llm_load_tensors: offloaded 79/81 layers to GPU ``` for qwen2: ``` llm_load_print_meta: model type...