langchaingo Weird behaviour of memory when calling localai

I am trying to make context-aware chat, which would work with both openAI and LocalAI (https://github.com/mudler/LocalAI)

I have this code:


	session, err := InitializeNewChatWithContextNoLimit(token,model_name,"localai")
	if err != nil {
		log.Println(err)
	}

	memory := session.ConversationBuffer
	

	res1,err := ContinueChatWithContextNoLimit(session,"Hi, my name is Bekket")
	if err != nil {
		log.Println(err)
	}
	fmt.Println(res1)
	res2, err := ContinueChatWithContextNoLimit(session,"What is my name?")
	if err != nil {
		log.Println(err)
	}
	fmt.Println(res2)

	

	log.Println("check if it's stored in messages, printing messages:")
	history, err := memory.ChatHistory.Messages(ctx)
	if err != nil {
		log.Println(err)
	}
	//log.Println(history)
	total_turns := len(history)
	log.Println("total number of turns: ", total_turns)
	// Iterate over each message and print
    log.Println("Printing messages:")
    for _, msg := range history {
        log.Println(msg.GetContent())
    }
    
    <....>
    
    
    
    // Initialize New Dialog thread with User with no limitation for token usage (may fail, use with limit)
func InitializeNewChatWithContextNoLimit(api_token string, model_name string, base_url string) (*db.ChatSession, error)  {
	//ctx := context.Background()

	if base_url == "" {
		llm, err := openai.New(
			openai.WithToken(api_token),
			openai.WithModel(model_name),
		)
		if err != nil {
			return nil, err
		}

		memoryBuffer := memory.NewConversationBuffer()
		conversation := chains.NewConversation(llm, memoryBuffer)
	
		return &db.ChatSession{
			ConversationBuffer: memoryBuffer,
			DialogThread: &conversation,
		}, nil
	} else {
		llm, err := openai.New(
			openai.WithToken(api_token),
			openai.WithModel(model_name),
			openai.WithBaseURL("http://localhost:8080"),
			openai.WithAPIVersion("v1"),
		)
		if err != nil {
			return nil, err
		}
		memoryBuffer := memory.NewConversationBuffer()
		conversation := chains.NewConversation(llm, memoryBuffer)
	
		return &db.ChatSession{
			ConversationBuffer: memoryBuffer,
			DialogThread: &conversation,
		}, nil
	}

}


// Continue Dialog with memory included, so user can chat with remembering context of previouse messages
func ContinueChatWithContextNoLimit(session *db.ChatSession, prompt string) (string, error) {
	ctx := context.Background()
    result, err := chains.Run(ctx, session.DialogThread, prompt)
    if err != nil {
        return "", err
    }
    return result, nil
}

When I try it with calling openAI it works fine and gives me respond Your name is Bekket When I try to call localAI (with model wizard-uncensored-13b) it gives me response I am sorry, I don't have access to your name

So first I thought that it might be a problem with local ai but then I tried to modify my code like this:


    	session, err := InitializeNewChatWithContextNoLimit(token,model_name,"localai")
	if err != nil {
		log.Println(err)
	}

	memory := session.ConversationBuffer
	memory.ChatHistory.AddUserMessage(ctx,"Hello, my name is Bekket, how are you?")
	memory.ChatHistory.AddAIMessage(ctx,"Hello Bekket, I am doing well. How are you?")

	res1,err := ContinueChatWithContextNoLimit(session,"I am working on a new project called 'Andromeda', do you like this project name?")
	if err != nil {
		log.Println(err)
	}
	fmt.Println(res1)
	res2, err := ContinueChatWithContextNoLimit(session,"What is my name and what project am I currently working on?")
	if err != nil {
		log.Println(err)
	}
	fmt.Println(res2)

Which gives me result Your name is Bekket and you are currently working on a project called 'Andromeda'.

So it's looked like memory works with localai only in case if there is first message which stored manually using memory.ChatHistory.AddUserMessage, and after that chat become aware of context of previous messages, and you don't have to save all messages this way, only first one, and then it works fine. If I try to call openai then both methods works fine

So it's just extremly weird behaviour and I am not sure even where the problem might be, and to what project it could be related? Is it a bug with memory related to this repository or it is bug related to localai or it is related to model I use?

In any case, if someone else experiencing problems with memory and context-aware conversation when working with localai -- workaround would be to manually save first message in conversation and then it will works fine

Mar 10 '24 15:03 JackBekket

@JackBekket looke this example

Mar 16 '24 05:03 devalexandre

@JackBekket looke this example

thanks! I will try this out and check if it's work

Mar 21 '24 05:03 JackBekket

Ok, I have finally nailed down where the issue is -- it's about templates collision

Firstly, I use wizard-uncensored 13b and use local-ai as backend, there is model config file (yaml):

name: wizard-uncensored-13b
f16: false # true to GPU acceleration
cuda: false # true to GPU acceleration
gpu_layers: 10 # this model have max 40 layers, 15-20 is reccomended for half-load at NVIDIA 4060 TiTan (more layers -- more VRAM required), (i guess 0 is no GPU)
parameters:
  model: wizard-uncensored-13b.gguf

template:

  chat: &template |
    Instruct: {{.Input}}
    Output:
  # Modify the prompt template here ^^^ as per your requirements
  completion: *template

As you can see here -- I use this template which modify my initial promt to this

Then langchain memory also modify initial promt to langchain memory template, so in result I can see that those templates collides with each other, end up with this weird behaviour, look:

4:25PM DBG Request received: {"model":"wizard-uncensored-13b","language":"","n":0,"top_p":null,"top_k":null,"temperature":0,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":0,"typical_p":0,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":[{"text":"The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n\nCurrent conversation:\nHuman: Hello, my name is Bekket, I am working on a new project called 'Andromeda'.\nAI: Hello Bekket, seems like a great name for a project!\nHuman: What is my name(not the name of the project), I mentioned?\nAI:","type":"text"}]}],"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}
4:25PM DBG Configuration read: &{PredictionOptions:{Model:wizard-uncensored-13b.gguf Language: N:0 TopP:0xc000a01878 TopK:0xc000a01880 Temperature:0xc000808228 Maxtokens:0xc000a01890 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0 TypicalP:0 Seed:0xc000a018c0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:wizard-uncensored-13b F16:0xc000a01848 Threads:0xc000a01868 Debug:0xc0008082f8 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:Instruct: {{.Input}}
Output:
ChatMessage: Completion:Instruct: {{.Input}}
Output:
Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000a018a8 MirostatTAU:0xc000a018a0 Mirostat:0xc000a01898 NGPULayers:0xc000a01850 MMap:0xc000a018b8 MMlock:0xc000a018b9 LowVRAM:0xc000a018b9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000a01860 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
4:25PM DBG Parameters: &{PredictionOptions:{Model:wizard-uncensored-13b.gguf Language: N:0 TopP:0xc000a01878 TopK:0xc000a01880 Temperature:0xc000808228 Maxtokens:0xc000a01890 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0 TypicalP:0 Seed:0xc000a018c0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:wizard-uncensored-13b F16:0xc000a01848 Threads:0xc000a01868 Debug:0xc0008082f8 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:Instruct: {{.Input}}
Output:
ChatMessage: Completion:Instruct: {{.Input}}
Output:
Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000a018a8 MirostatTAU:0xc000a018a0 Mirostat:0xc000a01898 NGPULayers:0xc000a01850 MMap:0xc000a018b8 MMlock:0xc000a018b9 LowVRAM:0xc000a018b9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000a01860 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}


4:25PM DBG Prompt (before templating): The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, my name is Bekket, I am working on a new project called 'Andromeda'.
AI: Hello Bekket, seems like a great name for a project!
Human: What is my name(not the name of the project), I mentioned?
AI:
4:25PM DBG Template found, input modified to: Instruct: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, my name is Bekket, I am working on a new project called 'Andromeda'.
AI: Hello Bekket, seems like a great name for a project!
Human: What is my name(not the name of the project), I mentioned?
AI:
Output:

4:25PM DBG Prompt (after templating): Instruct: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, my name is Bekket, I am working on a new project called 'Andromeda'.
AI: Hello Bekket, seems like a great name for a project!
Human: What is my name(not the name of the project), I mentioned?
AI:
Output:

So basically my promt is wrapped into langchain conversation template and then wrapped into my local template

Question -- How can I modify my template, to make it work properly with langchain conversation?

I have also following templates, but not sure if one of them would work:

{{.Input}}

### Response:

and

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{{.Input}}

### Response:

@devalexandre any suggestions?

Apr 13 '24 13:04 JackBekket