Baber Abbasi comments

Results 51 comments of


                                            Baber Abbasi

Evaluation Error on Scrolls Task - 2

Are you using `limit 1` for the second error? Might be because it divides by N - 1 to calculate the sample standard deviation. cc @lintangsutawika

Process hangs when using `tensor_parallel_size` and `data_parallel_size` together

Hey. Have you tried caching the weights by running with DP=1 until they are downloaded? I found it prone to hang with DP otherwise.

Process hangs when using `tensor_parallel_size` and `data_parallel_size` together

> Yes, the weights are cached. The process is hanging after `llm.generate` returns results. hmm. It's working for me with `0.3.2`. Have you tried running on a fresh virtual environment?

Process hangs when using `tensor_parallel_size` and `data_parallel_size` together

> Just tried it on a separate server and new env still face the same issue. What version of ray do you have? Mine is `ray==2.10.0` Probably the latest one....

When using Accelerate for data parallel inference, using different numbers of GPUs results in different results

It's probably because of #1308. So the fewshot samples used for a particular `doc_id` will vary depending on whether DP is used and the number of ranks. Best way to...

toxigen task measures toxicity classification rather than whether generations are toxic?

Hi! There's [Real Toxicity](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor/lm_eval/tasks/realtoxicityprompts) in the big-refactor (soon to be main) branch which evaluates the generations with the Perspective API (need a key but it's free) using a custom `metric.py`....

Baber Abbasi

Evaluation Error on Scrolls Task - 2

Process hangs when using `tensor_parallel_size` and `data_parallel_size` together

Process hangs when using `tensor_parallel_size` and `data_parallel_size` together

Process hangs when using `tensor_parallel_size` and `data_parallel_size` together

When using Accelerate for data parallel inference, using different numbers of GPUs results in different results

toxigen task measures toxicity classification rather than whether generations are toxic?

Improvements to MGSM

Improvements to MGSM

Improvements to MGSM

Speed up inference problems