ragas langchain.generate

code here seems to be not reasonable:

def generate(
        self,
        prompts: list[ChatPromptTemplate],
        n: int = 1,
        temperature: float = 1e-8,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        # set temperature to 0.2 for multiple completions
        temperature = 0.2 if n > 1 else 1e-8
        if isBedrock(self.llm) and ("model_kwargs" in self.llm.__dict__):
            self.llm.model_kwargs = {"temperature": temperature}
        else:
            self.llm.temperature = temperature

        if self.llm_supports_completions(self.llm):
            return self._generate_multiple_completions(prompts, n, callbacks)
        else:  # call generate_completions n times to mimic multiple completions
            list_llmresults = run_async_tasks(
                [self.generate_completions(prompts, callbacks) for _ in range(n)]
            )

            # fill results as if the LLM supported multiple completions
            generations = []
            for i in range(len(prompts)):
                completions = []
                for result in list_llmresults:
                    completions.append(result.generations[i][0])
                generations.append(completions)

            llm_output = _compute_token_usage_langchain(list_llmresults)
            return LLMResult(generations=generations, llm_output=llm_output)

i run the evaluate on cpu and memory, and encounter an error:

here 3
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
GGML_ASSERT: /tmp/pip-install-dnbwilnk/llama-cpp-python_8aba6af1128d49b6bada62c4fe0fd870/vendor/llama.cpp/ggml.c:15149: cgraph->nodes[cgraph->n_nodes - 1] == tensor
GGML_ASSERT: /tmp/pip-install-dnbwilnk/llama-cpp-python_8aba6af1128d49b6bada62c4fe0fd870/vendor/llama.cpp/ggml.c:4039: ggml_can_mul_mat(a, b)
GGML_ASSERT: /tmp/pip-install-dnbwilnk/llama-cpp-python_8aba6af1128d49b6bada62c4fe0fd870/vendor/llama.cpp/ggml-alloc.c:453: view->view_src != NULL && view->view_src->data != NULL
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Segmentation fault (core dumped)

i check the n is 3. so may this code be causing it:

list_llmresults = run_async_tasks(
                [self.generate_completions(prompts, callbacks) for _ in range(n)]
            )

can you help me, thanks!

Dec 05 '23 07:12 zhzfight

which LLM are you using? if you are using an Opensource model or embedding could you share how your initiating that and the details

Dec 05 '23 12:12 jjmachan

i initial my model use the following code: from langchain.llms import LlamaCpp model = LlamaCpp( model_path="model/mistral-7b-instruct-v0.1.gguf", temperature=0.70, max_tokens=2000, n_ctx=4096, top_p=1, verbose=True, ) and my package version is: langchain==0.0.340 ragas==0.0.20

Dec 06 '23 01:12 zhzfight

I see, this bug is due to LlamaCpp and async not working properly due to multithreading issues.

Dec 07 '23 16:12 jjmachan

I see, this bug is due to LlamaCpp and async not working properly due to multithreading issues.

So, is there a workaround. How to solve this?

Mar 01 '24 22:03 Davo00