av_hubert
av_hubert copied to clipboard
Cython error during pre-training
Hi, I have been trying to train the model using the following command,
fairseq-hydra-train --config-dir /home/jupyter/aaryan/av_hubert/avhubert/conf/pretrain --config-name base_lrs3_iter1.yaml \
task.data=/home/jupyter/aaryan/av_hubert/avhubert/lrs3/30h_data \
task.label_dir=/home/jupyter/aaryan/av_hubert/avhubert/features model.label_rate=100 \
hydra.run.dir=/home/jupyter/aaryan/av_hubert/avhubert/test_run common.user_dir=`pwd`
While running this command I am facing this error.
Traceback (most recent call last):
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/data_utils.py", line 312, in batch_by_size
from fairseq.data.data_utils_fast import (
ImportError: /home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so: failed to map segment from shared object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq_cli/hydra_train.py", line 45, in hydra_main
distributed_utils.call_main(cfg, pre_main)
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/distributed/utils.py", line 369, in call_main
main(cfg, **kwargs)
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq_cli/train.py", line 155, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/checkpoint_utils.py", line 261, in load_checkpoint
epoch_itr = trainer.get_train_iterator(
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/trainer.py", line 596, in get_train_iterator
batch_iterator = self.task.get_batch_iterator(
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/tasks/fairseq_task.py", line 286, in get_batch_iterator
batch_sampler = dataset.batch_by_size(
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/fairseq_dataset.py", line 145, in batch_by_size
return data_utils.batch_by_size(
File "/home/jupyter/aaryan/av_hubert/fairseq/fairseq/data/data_utils.py", line 318, in batch_by_size
raise ImportError(
ImportError: Please build Cython components with: `python setup.py build_ext --inplace`
The error mentions to try rebuilding the Cython components.
When I go to the fairseq directory and rebuild the components using the command python setup.py build_ext --inplace
given above. I get this:
running build_ext
/opt/conda/envs/avh/lib/python3.8/site-packages/torch/utils/cpp_extension.py:370: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
skipping 'fairseq/data/data_utils_fast.cpp' Cython extension (up-to-date)
skipping 'fairseq/data/token_block_utils_fast.cpp' Cython extension (up-to-date)
copying build/lib.linux-x86_64-cpython-38/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-cpython-38/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-cpython-38/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-cpython-38/fairseq/libbase.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-cpython-38/fairseq/libnat.cpython-38-x86_64-linux-gnu.so -> fairseq
Even after rebuilding the cython components the training command still returns the same error. Any idea what is going wrong and how to resolve this?
Maybe this thread will help.