BERTopic
BERTopic copied to clipboard
Missing function
The tutoriels for LLM topic generation use textgeneration.py or openai, thouse classes have this function to insert topics and documents into a custom prompt.
def _create_prompt(self, docs, topic, topics): keywords = ", ".join(list(zip(*topics[topic]))[0])
# Use the default prompt and replace keywords
if self.prompt == DEFAULT_PROMPT:
prompt = self.prompt.replace("[KEYWORDS]", keywords)
# Use a prompt that leverages either keywords or documents in
# a custom location
else:
prompt = self.prompt
if "[KEYWORDS]" in prompt:
prompt = prompt.replace("[KEYWORDS]", keywords)
if "[DOCUMENTS]" in prompt:
to_replace = ""
for doc in docs:
to_replace += f"- {doc}\n"
prompt = prompt.replace("[DOCUMENTS]", to_replace)
return prompt
It seems like this function is missing from the Langchain wrapper and therefore using a langchain pipeline will not replace the prompt keywords with DOCUMENTS/TOPICS
I will write my own wrapper for now, just wanted confirmation if this is the reason topics were not inserted into my prompt or if I am missing something crucial here in comparison to the other wrapers.
LangChain works a bit differently from these other methods. As you can see in the source code here the prompts do not use the [DOCUMENTS]
tag and instead will directly give LangChain the representative documents instead:
https://github.com/MaartenGr/BERTopic/blob/7d07e1e94e69be278f79a48d73602cdc4df0885f/bertopic/representation/_langchain.py#L171-L191
That does indeed mean that the documentation should be updated to properly describe this phenomenon.
Does it mean that, currently, the LangChain representation model doesn't give the option to put keywords in the prompt, right?
That is correct. It should be straightforward to implement yourself considering other models do have that option.