av_hubert icon indicating copy to clipboard operation
av_hubert copied to clipboard

Cython error during pre-training

Open Aaryan369 opened this issue 2 years ago • 1 comments

Hi, I have been trying to train the model using the following command,

fairseq-hydra-train --config-dir /home/jupyter/aaryan/av_hubert/avhubert/conf/pretrain --config-name base_lrs3_iter1.yaml \
task.data=/home/jupyter/aaryan/av_hubert/avhubert/lrs3/30h_data \
task.label_dir=/home/jupyter/aaryan/av_hubert/avhubert/features model.label_rate=100 \
hydra.run.dir=/home/jupyter/aaryan/av_hubert/avhubert/test_run common.user_dir=`pwd`

While running this command I am facing this error.

Traceback (most recent call last):
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/data_utils.py", line 312, in batch_by_size
    from fairseq.data.data_utils_fast import (
ImportError: /home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so: failed to map segment from shared object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq_cli/hydra_train.py", line 45, in hydra_main
    distributed_utils.call_main(cfg, pre_main)
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq_cli/train.py", line 155, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/checkpoint_utils.py", line 261, in load_checkpoint
    epoch_itr = trainer.get_train_iterator(
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/trainer.py", line 596, in get_train_iterator
    batch_iterator = self.task.get_batch_iterator(
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/tasks/fairseq_task.py", line 286, in get_batch_iterator
    batch_sampler = dataset.batch_by_size(
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/fairseq_dataset.py", line 145, in batch_by_size
    return data_utils.batch_by_size(
  File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/data_utils.py", line 318, in batch_by_size
    raise ImportError(
ImportError: Please build Cython components with: `python setup.py build_ext --inplace`

The error mentions to try rebuilding the Cython components. When I go to the fairseq directory and rebuild the components using the command python setup.py build_ext --inplace given above. I get this:

running build_ext
/opt/conda/envs/avh/lib/python3.8/site-packages/torch/utils/cpp_extension.py:370: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
skipping 'fairseq/data/data_utils_fast.cpp' Cython extension (up-to-date)
skipping 'fairseq/data/token_block_utils_fast.cpp' Cython extension (up-to-date)
copying build/lib.linux-x86_64-cpython-38/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-cpython-38/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-cpython-38/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-cpython-38/fairseq/libbase.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-cpython-38/fairseq/libnat.cpython-38-x86_64-linux-gnu.so -> fairseq

Even after rebuilding the cython components the training command still returns the same error. Any idea what is going wrong and how to resolve this?

Aaryan369 avatar May 17 '22 11:05 Aaryan369

Maybe this thread will help.

chevalierNoir avatar May 17 '22 17:05 chevalierNoir