Kai comments

Results 12 comments of

Kai

Can we use Pyspark dataframe as input

Hi there, I have the same problem here. My data is about ~ 10billion in rows and ~100 in features, which is obviously not suitable for single machine. Hope there...

ds_z3_config.json stage3_prefetch_bucket_size 应该是一个整数

我也遇到了相同的问题。配置文件中使用的是 "stage3_prefetch_bucket_size": "auto", 报错提示不是整数 [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/config.py", line 817, in _initialize_params [rank5]: self.zero_config = get_zero_config(param_dict) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config [rank5]: return DeepSpeedZeroConfig(**zero_config_dict) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/config_utils.py", line 57,...

ds_z3_config.json stage3_prefetch_bucket_size 应该是一个整数

> Having the same issue as well. You'll have to downgrade your deepspeed > > #5252 Thanks a lot, downgrade `deepspeed`==0.14.4 helps solve the problem

请教怎么使用swift infer

我修改了 --infer_backend vllm 推理速度看起来是会快不少，但是在下面的代码 ![Image](https://github.com/user-attachments/assets/a993278b-be90-456c-8b1e-9511b7866aed) [rank1]: File "/usr/local/lib64/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2723, in all_gather_object [rank1]: all_gather(object_size_list, local_size, group=group) [rank1]: File "/usr/local/lib64/python3.11/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper [rank1]: return func(*args, **kwargs) [rank1]: ^^^^^^^^^^^^^^^^^^^^^ [rank1]:...

[BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

hi, is there any progress on this? Very often to use this tool in the no-internet environments. Please help often only give model path and dataset path, then eval the...

[BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

> hey [@Oukaishen](https://github.com/Oukaishen) ! is your dataset downloaded already ? If yes, you can point to it in your task config instead of having for example: `openai/gsm8k` you would have...

[BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

Both way the errors are all related to the timeout hang(I gusess waiting for downloading the dataset file, which I have downloaded in specific directory)

[BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

Also I install the lighteval by 'pip install', so how do i change the files like , if i guess like > lighteval/src/lighteval/tasks/tasks/gsm8k.py

[BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

Here follows the error stack ``` Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifeval/instructions_utils.py", line 27, in download_nltk_resources nltk.data.find("tokenizers/punkt") File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt...

[BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

hi @NathanHB , can you give more detail help. Really thanks Key piont : local(no internet download) model path + local dataset path（GSM8K） => Benchmark(evaluation) Results.