ColossalAI-Examples icon indicating copy to clipboard operation
ColossalAI-Examples copied to clipboard

BERT Data Preprocessing

Open JizeZhangCS opened this issue 2 years ago • 3 comments

🐛 Describe the bug

NVIDIA DeepLearningExamples removed LDDL from DLE tools on Aug 16, 2022. Therefore, the guide on https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/bert/preprocessing fails to work in the following aspects:

  1. pip install git+https://github.com/NVIDIA/DeepLearningExamples.git#subdirectory=Tools/lddl won't work. The solution could be either using the new url, i.e. pip install git+https://github.com/NVIDIA/lddl.git, or finding lddl from the history version https://github.com/NVIDIA/DeepLearningExamples/tree/29f5b7ab059025e4ead512e54037eddbdf740f19.
  2. after installing lddl, using pip install boto3 would lead to a version conflict, which is of unknown effect on the whole process.
  3. in the preprocessing part, both phase 1 and phase 2 wouldn't work. The details would be provided later.
  4. changing lddl source from the new url to the history version wouldn't solve the problem 3, not installing boto3 also wouldn't help.

Environment

python=3.8 pytorch=1.12.1 cudatoolkit=10.2.89 cuda=10.2

JizeZhangCS avatar Aug 27 '22 18:08 JizeZhangCS

More details about problem 2: install lddl first, then install boto3: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. awscli 1.22.55 requires botocore==1.24.0, but you have botocore 1.27.61 which is incompatible. awscli 1.22.55 requires s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.6.0 which is incompatible.

after install boto3, reinstall lddl: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. boto3 1.24.61 requires botocore<1.28.0,>=1.27.61, but you have botocore 1.24.0 which is incompatible. boto3 1.24.61 requires s3transfer<0.7.0,>=0.6.0, but you have s3transfer 0.5.2 which is incompatible.

JizeZhangCS avatar Aug 27 '22 18:08 JizeZhangCS

More details about problem 3: output.txt

JizeZhangCS avatar Aug 27 '22 18:08 JizeZhangCS

You can try pip install with the --no-dependencies flag to ignore the dependency conflict.

FrankLeeeee avatar Aug 29 '22 02:08 FrankLeeeee