langchain.generate
code here seems to be not reasonable:
def generate(
self,
prompts: list[ChatPromptTemplate],
n: int = 1,
temperature: float = 1e-8,
callbacks: t.Optional[Callbacks] = None,
) -> LLMResult:
# set temperature to 0.2 for multiple completions
temperature = 0.2 if n > 1 else 1e-8
if isBedrock(self.llm) and ("model_kwargs" in self.llm.__dict__):
self.llm.model_kwargs = {"temperature": temperature}
else:
self.llm.temperature = temperature
if self.llm_supports_completions(self.llm):
return self._generate_multiple_completions(prompts, n, callbacks)
else: # call generate_completions n times to mimic multiple completions
list_llmresults = run_async_tasks(
[self.generate_completions(prompts, callbacks) for _ in range(n)]
)
# fill results as if the LLM supported multiple completions
generations = []
for i in range(len(prompts)):
completions = []
for result in list_llmresults:
completions.append(result.generations[i][0])
generations.append(completions)
llm_output = _compute_token_usage_langchain(list_llmresults)
return LLMResult(generations=generations, llm_output=llm_output)
i run the evaluate on cpu and memory, and encounter an error:
here 3
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
GGML_ASSERT: /tmp/pip-install-dnbwilnk/llama-cpp-python_8aba6af1128d49b6bada62c4fe0fd870/vendor/llama.cpp/ggml.c:15149: cgraph->nodes[cgraph->n_nodes - 1] == tensor
GGML_ASSERT: /tmp/pip-install-dnbwilnk/llama-cpp-python_8aba6af1128d49b6bada62c4fe0fd870/vendor/llama.cpp/ggml.c:4039: ggml_can_mul_mat(a, b)
GGML_ASSERT: /tmp/pip-install-dnbwilnk/llama-cpp-python_8aba6af1128d49b6bada62c4fe0fd870/vendor/llama.cpp/ggml-alloc.c:453: view->view_src != NULL && view->view_src->data != NULL
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Segmentation fault (core dumped)
i check the n is 3. so may this code be causing it:
list_llmresults = run_async_tasks(
[self.generate_completions(prompts, callbacks) for _ in range(n)]
)
can you help me, thanks!
which LLM are you using? if you are using an Opensource model or embedding could you share how your initiating that and the details
i initial my model use the following code:
from langchain.llms import LlamaCpp model = LlamaCpp( model_path="model/mistral-7b-instruct-v0.1.gguf", temperature=0.70, max_tokens=2000, n_ctx=4096, top_p=1, verbose=True, )
and my package version is:
langchain==0.0.340
ragas==0.0.20
I see, this bug is due to LlamaCpp and async not working properly due to multithreading issues.
I see, this bug is due to LlamaCpp and async not working properly due to multithreading issues.
So, is there a workaround. How to solve this?