gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

GPT4All v3.1.1: Replies from an LLM sometimes contain framing tokens from its System Prompt or Prompt Template

Open SINAPSA-IC opened this issue 1 year ago • 7 comments
trafficstars

Bug Report

Sometimes, framing tokens from the System Prompt or the Prompt Template of a model seep into some of its replies.

The sequence

  • inserted itself into a word
  • replaced 1 letter/character and inserted a blank space to its right (LTR is systemwide)
  • and ofc, rendered that short part of the reply unintelligible:

evolution: Hyp<|im_start|> bradytely refers to

Also, the error somehow propagated down the reply, with the broken word restored, the alien sequence being replaced with the 1 letter/character that was previously thrown out:

era. The hypobradytely hypothesis suggests

(https://en.wiktionary.org/wiki/hypobradytely)

This seems not to be a reproducible experiment:

  • I haven't noted down the model when this has happened time and again (this is the 3rd time I see this, it may have occurred more than 2 times prior to this)
  • the image below refers to Hermes 2 Mistral DPO, whose System Prompt and Prompt Template were left as they were, the default texts;
  • the only common factor of such events is the length of such a reply - in the thousands of tokens (may be over 3000).

Steps to Reproduce

I can't tell. This time, 1 collection of LocalDocs was being used.

Expected Behavior

Replies should not contain <framing tokens> originating in the System Prompt and/or Prompt Template.

Your Environment

  • GPT4All version: v3.1.1

  • Operating System: Windows 10 Pro, updated as of 2024.07

  • Chat model used (if applicable): Nous Hermes 2 Mistral DPO

  • Context Length: 30720

  • Temperature: 0.1

  • Top K: 50

  • Top P: 0.6

  • Min P: 0

SINAPSA-IC avatar Jul 31 '24 09:07 SINAPSA-IC