LMFlow
LMFlow copied to clipboard
`preprocessing_num_workers` can not use in `scripts/run_finetune.sh`
Describe the bug
tokenizer map in hf_decoder_model
use multi preprocessing_num_workers
will return TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
To Reproduce Steps to reproduce the behavior:
add --preprocessing_num_workers 20 \
to scripts/run_finetune.sh
#!/bin/bash
# Please run this script under ${project_id} in project directory of
# https://github.com/shizhediao/llm-ft
# COMMIT: d5fecf30ba8011067b10cf51fede53a5ab6574e4
deepspeed_args="--master_port=11000" # Default argument
if [ $# -ge 1 ]; then
deepspeed_args="$1"
fi
exp_id=finetune
project_dir=$(cd "$(dirname $0)"/..; pwd)
output_dir=${project_dir}/output_models/${exp_id}
log_dir=${project_dir}/log/${exp_id}
dataset_path=${project_dir}/data/alpaca/train
mkdir -p ${output_dir} ${log_dir}
deepspeed ${deepspeed_args} \
examples/finetune.py \
--model_name_or_path gpt2 \
--dataset_path ${dataset_path} \
--preprocessing_num_workers 20 \
--output_dir ${output_dir} --overwrite_output_dir \
--num_train_epochs 0.01 \
--learning_rate 2e-5 \
--block_size 512 \
--per_device_train_batch_size 1 \
--deepspeed configs/ds_config_zero3.json \
--bf16 \
--run_name finetune \
--validation_split_percentage 0 \
--logging_steps 20 \
--do_train \
--ddp_timeout 72000 \
--save_steps 5000 \
--dataloader_num_workers 1 \
| tee ${log_dir}/train.log \
2> ${log_dir}/train.err
just start:
./scripts/run_finetune.sh
Screenshots
(lmflow) root@dev:/data/dev/gpt/LMFlow# ./scripts/run_finetune.sh
[2023-06-09 15:13:18,610] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-06-09 15:13:19,605] [INFO] [runner.py:550:main] cmd = /root/miniconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path /data/dev/gpt/LMFlow/data/alpaca/train --preprocessing_num_workers 20 --output_dir /data/dev/gpt/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NCCL_VERSION=2.13.4-1
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-06-09 15:13:21,237] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1
[2023-06-09 15:13:21,237] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-06-09 15:13:21,237] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-06-09 15:13:21,237] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-06-09 15:13:21,237] [INFO] [launch.py:162:main] dist_world_size=8
[2023-06-09 15:13:21,237] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-06-09 15:13:28,841] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
06/09/2023 15:13:29 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:29 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:30 - WARNING - lmflow.pipeline.finetuner - Process rank: 4, device: cuda:4, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:30 - WARNING - lmflow.pipeline.finetuner - Process rank: 5, device: cuda:5, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:30 - WARNING - lmflow.pipeline.finetuner - Process rank: 7, device: cuda:7, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:30 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:30 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:30 - WARNING - lmflow.pipeline.finetuner - Process rank: 6, device: cuda:6, n_gpu: 1,distributed training: True, 16-bits training: False
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
06/09/2023 15:13:31 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0dfe5723824151c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
[2023-06-09 15:13:49,650] [INFO] [partition_parameters.py:415:__exit__] finished initializing model with 0.16B parameters
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
Traceback (most recent call last):
File "/data/dev/gpt/LMFlow/examples/finetune.py", line 61, in <module>
main()
File "/data/dev/gpt/LMFlow/examples/finetune.py", line 57, in main
tuned_model = finetuner.tune(model=model, dataset=dataset)
File "/data/dev/gpt/LMFlow/src/lmflow/pipeline/finetuner.py", line 210, in tune
tokenized_dataset = model.tokenize(dataset)
File "/data/dev/gpt/LMFlow/src/lmflow/models/hf_decoder_model.py", line 432, in tokenize
tokenized_datasets = raw_datasets.map(
File "/data/dev/gpt/LMFlow/src/lmflow/datasets/dataset.py", line 323, in map
mapped_backend_dataset = self.backend_dataset.map(*args, **kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 563, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3046, in map
for rank, done, content in iflatmap_unordered(
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 1373, in iflatmap_unordered
[async_result.get() for async_result in async_results]
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 1373, in <listcomp>
[async_result.get() for async_result in async_results]
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/multiprocess/pool.py", line 771, in get
raise self._value
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/multiprocess/pool.py", line 537, in _handle_tasks
put(task)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/multiprocess/connection.py", line 214, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/multiprocess/reduction.py", line 54, in dumps
cls(buf, protocol, *args, **kwds).dump(obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 498, in dump
StockPickler.dump(self, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 487, in dump
self.save(obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 901, in save_tuple
save(element)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 886, in save_tuple
save(element)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 1493, in save_function
pickler.save_reduce(_create_function, (obj.__code__,
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 692, in save_reduce
save(args)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 901, in save_tuple
save(element)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 901, in save_tuple
save(element)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 1227, in save_cell
pickler.save_reduce(_create_cell, (f,), obj=obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 692, in save_reduce
save(args)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 886, in save_tuple
save(element)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 713, in save_reduce
self._batch_setitems(dictitems)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 713, in save_reduce
self._batch_setitems(dictitems)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 713, in save_reduce
self._batch_setitems(dictitems)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 1002, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 692, in save_reduce
save(args)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 901, in save_tuple
save(element)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/dill/_dill.py", line 990, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/root/miniconda3/envs/lmflow/lib/python3.9/pickle.py", line 578, in save
rv = reduce(self.proto)
TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
Running tokenizer on dataset (num_proc=20): 0%| | 0/52002 [00:00<?, ? examples/s][2023-06-09 15:14:13,505] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142597
[2023-06-09 15:14:13,505] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142598
[2023-06-09 15:14:14,680] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142599
[2023-06-09 15:14:15,076] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142600
[2023-06-09 15:14:15,394] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142727
[2023-06-09 15:14:15,821] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142728
[2023-06-09 15:14:16,254] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142795
[2023-06-09 15:14:17,570] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 142856
[2023-06-09 15:14:18,084] [ERROR] [launch.py:324:sigkill_handler] ['/root/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=7', '--model_name_or_path', 'gpt2', '--dataset_path', '/data/dev/gpt/LMFlow/data/alpaca/train', '--preprocessing_num_workers', '20', '--output_dir', '/data/dev/gpt/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1
package versions:
pip list
Package Version Editable project location
------------------------ ----------- -----------------------------
absl-py 1.4.0
accelerate 0.19.0
aiohttp 3.8.4
aiosignal 1.3.1
antlr4-python3-runtime 4.9.3
appdirs 1.4.4
async-timeout 4.0.2
attrs 23.1.0
blinker 1.6.2
certifi 2023.5.7
chardet 5.1.0
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.3
colorama 0.4.6
cpm-kernels 1.0.11
DataProperty 0.55.1
datasets 2.10.1
deepspeed 0.8.3
dill 0.3.4
docker-pycreds 0.4.0
einops 0.6.1
evaluate 0.4.0
filelock 3.12.0
flash-attn 1.0.4
Flask 2.3.2
Flask-Cors 3.0.10
frozenlist 1.3.3
fsspec 2023.5.0
gitdb 4.0.10
GitPython 3.1.31
hjson 3.1.0
huggingface-hub 0.14.1
icetk 0.0.7
idna 3.4
importlib-metadata 6.6.0
itsdangerous 2.1.2
Jinja2 3.1.2
joblib 1.2.0
jsonlines 3.1.0
lit 16.0.3
lm-eval 0.3.0
lmflow 0.0.1 /data/dev/gpt/LMFlow/src
MarkupSafe 2.1.2
mbstrdecoder 1.1.2
mpi4py 3.1.4
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.12.2
networkx 3.1
ninja 1.11.1
nltk 3.8.1
numexpr 2.8.4
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
omegaconf 2.3.0
openai 0.27.6
packaging 23.1
pandas 2.0.1
pathtools 0.1.2
pathvalidate 2.5.2
peft 0.3.0.dev0
Pillow 9.5.0
pip 23.0.1
portalocker 2.7.0
protobuf 3.18.3
psutil 5.9.5
py-cpuinfo 9.0.0
pyarrow 12.0.0
pybind11 2.10.4
pycountry 22.3.5
pydantic 1.10.7
pytablewriter 0.64.2
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
regex 2023.5.5
requests 2.30.0
responses 0.18.0
rouge-score 0.1.2
sacrebleu 1.5.0
scikit-learn 1.2.2
scipy 1.10.1
sentencepiece 0.1.99
sentry-sdk 1.22.2
setproctitle 1.3.2
setuptools 66.0.0
six 1.16.0
smmap 5.0.0
sqlitedict 2.1.0
sympy 1.12
tabledata 1.3.1
tcolorpy 0.1.3
threadpoolctl 3.1.0
tokenizers 0.13.3
torch 2.0.0
torchvision 0.15.1
tqdm 4.65.0
tqdm-multiprocess 0.0.11
transformers 4.28.0.dev0
triton 2.0.0
trl 0.4.2.dev0
typepy 1.3.0
typing_extensions 4.5.0
tzdata 2023.3
urllib3 1.26.15
wandb 0.14.0
Werkzeug 2.3.4
wheel 0.38.4
xxhash 3.2.0
yarl 1.9.2
zipp 3.15.0
zstandard 0.21.0
Hi, Is it working well without setting this?
Thank you for your reply. It works well if I do not set preprocessing_num_workers
. However, I am just curious about why it does not work when this parameter is added.
Can you reproduce this problem? Or is it just an issue with my environment?
遇到同样问题,加载数据时,没办法并行处理加载
same question
遇到同样问题,加载数据时,没办法并行处理加载
FYI: We've located the bug, and dev team needs to perform a small-scale refactoring to fix. We will do ASAP and sorry for the inconvenience 🙏
遇到同样问题,加载数据时,没办法并行处理加载
FYI: Bug fixed, please see https://github.com/OptimalScale/LMFlow/pull/845 🤗