fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

fairseq-train: error: unrecognized arguments: --mask-multiple-length 10 --mask-stdev 10

Open cpark-dev opened this issue 4 years ago • 6 comments

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd
fairseq-train --fp16 $RESULT/quantized/fairseq-bin-data \
    --task masked_lm --criterion masked_lm \
    --save-dir $CHECKPOINT/BERT_CPC_big_kmeans50 \
    --keep-last-epochs 1 \
    --train-subset train \
    --num-workers 4 \
    --arch roberta_base \
    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
    --lr-scheduler polynomial_decay --lr 0.0005 --total-num-update 250000 --warmup-updates 10000 \
    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
    --mask-multiple-length 10 --mask-prob 0.5 --mask-stdev 10 \
    --sample-break-mode eos --tokens-per-sample 3072 --max-positions 6144 \
    --max-tokens 4096 --update-freq 4 --max-update 250000 \
    --seed 5 --log-format simple --log-interval 10 --skip-invalid-size-inputs-valid-test
  1. See error
usage: fairseq-train [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL]
                     [--log-format {json,none,simple,tqdm}]
                     [--tensorboard-logdir TENSORBOARD_LOGDIR] [--seed SEED]
                     [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16]
                     [--fp16] [--memory-efficient-fp16]
                     [--fp16-no-flatten-grads]
                     [--fp16-init-scale FP16_INIT_SCALE]
                     [--fp16-scale-window FP16_SCALE_WINDOW]
                     [--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
                     [--min-loss-scale MIN_LOSS_SCALE]
                     [--threshold-loss-scale THRESHOLD_LOSS_SCALE]
                     [--user-dir USER_DIR]
                     [--empty-cache-freq EMPTY_CACHE_FREQ]
                     [--all-gather-list-size ALL_GATHER_LIST_SIZE]
                     [--model-parallel-size MODEL_PARALLEL_SIZE]
                     [--checkpoint-suffix CHECKPOINT_SUFFIX]
                     [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT]
                     [--quantization-config-path QUANTIZATION_CONFIG_PATH]
                     [--profile]
                     [--criterion {legacy_masked_lm_loss,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,wav2vec,ctc,cross_entropy,sentence_ranking,composite_loss,adaptive_loss,sentence_prediction,masked_lm,nat_loss,vocab_parallel_cross_entropy}]
                     [--tokenizer {space,nltk,moses}]
                     [--bpe {byte_bpe,subword_nmt,hf_byte_bpe,sentencepiece,characters,bert,gpt2,fastbpe,bytes}]
                     [--optimizer {adadelta,sgd,lamb,nag,adafactor,adagrad,adam,adamax}]
                     [--lr-scheduler {tri_stage,polynomial_decay,triangular,reduce_lr_on_plateau,cosine,fixed,inverse_sqrt}]
                     [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK]
                     [--num-workers NUM_WORKERS]
                     [--skip-invalid-size-inputs-valid-test]
                     [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE]
                     [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE]
                     [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE]
                     [--dataset-impl {raw,lazy,cached,mmap,fasta}]
                     [--data-buffer-size DATA_BUFFER_SIZE]
                     [--train-subset TRAIN_SUBSET]
                     [--valid-subset VALID_SUBSET]
                     [--validate-interval VALIDATE_INTERVAL]
                     [--validate-interval-updates VALIDATE_INTERVAL_UPDATES]
                     [--validate-after-updates VALIDATE_AFTER_UPDATES]
                     [--fixed-validation-seed FIXED_VALIDATION_SEED]
                     [--disable-validation]
                     [--max-tokens-valid MAX_TOKENS_VALID]
                     [--batch-size-valid BATCH_SIZE_VALID]
                     [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET]
                     [--num-shards NUM_SHARDS] [--shard-id SHARD_ID]
                     [--distributed-world-size DISTRIBUTED_WORLD_SIZE]
                     [--distributed-rank DISTRIBUTED_RANK]
                     [--distributed-backend DISTRIBUTED_BACKEND]
                     [--distributed-init-method DISTRIBUTED_INIT_METHOD]
                     [--distributed-port DISTRIBUTED_PORT]
                     [--device-id DEVICE_ID] [--distributed-no-spawn]
                     [--ddp-backend {c10d,no_c10d}]
                     [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus]
                     [--find-unused-parameters] [--fast-stat-sync]
                     [--broadcast-buffers]
                     [--distributed-wrapper {DDP,SlowMo}]
                     [--slowmo-momentum SLOWMO_MOMENTUM]
                     [--slowmo-algorithm SLOWMO_ALGORITHM]
                     [--localsgd-frequency LOCALSGD_FREQUENCY]
                     [--nprocs-per-node NPROCS_PER_NODE]
                     [--pipeline-model-parallel]
                     [--pipeline-balance PIPELINE_BALANCE]
                     [--pipeline-devices PIPELINE_DEVICES]
                     [--pipeline-chunks PIPELINE_CHUNKS]
                     [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE]
                     [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES]
                     [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE]
                     [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES]
                     [--pipeline-checkpoint {always,never,except_last}]
                     [--zero-sharding {none,os}] [--arch ARCH]
                     [--max-epoch MAX_EPOCH] [--max-update MAX_UPDATE]
                     [--stop-time-hours STOP_TIME_HOURS]
                     [--clip-norm CLIP_NORM] [--sentence-avg]
                     [--update-freq UPDATE_FREQ] [--lr LR] [--min-lr MIN_LR]
                     [--use-bmuf] [--save-dir SAVE_DIR]
                     [--restore-file RESTORE_FILE]
                     [--finetune-from-model FINETUNE_FROM_MODEL]
                     [--reset-dataloader] [--reset-lr-scheduler]
                     [--reset-meters] [--reset-optimizer]
                     [--optimizer-overrides OPTIMIZER_OVERRIDES]
                     [--save-interval SAVE_INTERVAL]
                     [--save-interval-updates SAVE_INTERVAL_UPDATES]
                     [--keep-interval-updates KEEP_INTERVAL_UPDATES]
                     [--keep-last-epochs KEEP_LAST_EPOCHS]
                     [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS]
                     [--no-save] [--no-epoch-checkpoints]
                     [--no-last-checkpoints] [--no-save-optimizer-state]
                     [--best-checkpoint-metric BEST_CHECKPOINT_METRIC]
                     [--maximize-best-checkpoint-metric] [--patience PATIENCE]
                     [--encoder-layers L] [--encoder-embed-dim H]
                     [--encoder-ffn-embed-dim F] [--encoder-attention-heads A]
                     [--activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}]
                     [--pooler-activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}]
                     [--encoder-normalize-before] [--dropout D]
                     [--attention-dropout D] [--activation-dropout D]
                     [--pooler-dropout D] [--max-positions MAX_POSITIONS]
                     [--load-checkpoint-heads] [--encoder-layerdrop D]
                     [--encoder-layers-to-keep ENCODER_LAYERS_TO_KEEP]
                     [--quant-noise-pq D] [--quant-noise-pq-block-size D]
                     [--quant-noise-scalar D] [--untie-weights-roberta]
                     [--spectral-norm-classification-head]
                     [--adam-betas ADAM_BETAS] [--adam-eps ADAM_EPS]
                     [--weight-decay WEIGHT_DECAY] [--use-old-adam]
                     [--force-anneal N] [--warmup-updates N]
                     [--end-learning-rate END_LEARNING_RATE] [--power POWER]
                     [--total-num-update TOTAL_NUM_UPDATE]
                     [--sample-break-mode {none,complete,complete_doc,eos}]
                     [--tokens-per-sample TOKENS_PER_SAMPLE]
                     [--mask-prob MASK_PROB]
                     [--leave-unmasked-prob LEAVE_UNMASKED_PROB]
                     [--random-token-prob RANDOM_TOKEN_PROB]
                     [--freq-weighted-replacement] [--mask-whole-words]
                     [--shorten-method {none,truncate,random_crop}]
                     [--shorten-data-split-list SHORTEN_DATA_SPLIT_LIST]
                     data
fairseq-train: error: unrecognized arguments: --mask-multiple-length 10 --mask-stdev 10

Code sample

Expected behavior

Environment

  • fairseq Version (e.g., 1.0 or master): 1.0.0a0+9316f13
  • PyTorch Version (e.g., 1.0): 1.7.1
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source):
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
  • Python version: 3.8.5
  • CUDA/cuDNN version: V9.2.148
  • GPU models and configuration:
  • Any other relevant information:

Additional context

  1. I am following the tutorial of ZeroSpeech 2021 baseline system. link
  2. The two parameters, mask-multiple-length and mask-stdev, are in the masked_lm.py. So, I think it should work.

cpark-dev avatar Feb 10 '21 14:02 cpark-dev

it works for me. are you sure you have the latest fairseq version installed? try updating to master and then running something like this from the checked out dir:

PYTHONPATH=. python fairseq_cli/train.py --fp16 $RESULT/quantized/fairseq-bin-data \
...

alexeib avatar Feb 12 '21 19:02 alexeib

I have another error.

$ PYTHONPATH=. \
> python fairseq_cli/train.py --fp16 /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data \
>     --task masked_lm --criterion masked_lm \
>     --save-dir /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50 \
>     --keep-last-epochs 1 \
>     --train-subset train \
>     --num-workers 1 \
>     --arch roberta_base \
>     --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
>     --lr-scheduler polynomial_decay --lr 0.0005 --total-num-update 250000 --warmup-updates 10000 \
>     --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
>     --mask-multiple-length 10 --mask-prob 0.5 --mask-stdev 10 \
>     --sample-break-mode eos --tokens-per-sample 3072 --max-positions 6144 \
>     --max-tokens 4096 --update-freq 128 --max-update 250000 \
>     --seed 5 --log-format simple --log-interval 10 --skip-invalid-size-inputs-valid-test
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370131125/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
2021-02-12 21:49:30 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 10, 'log_format': 'simple', 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 5, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'tpu': False, 'distributed_num_procs': 0}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': True, 'max_tokens': 4096, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 4096, 'batch_size_valid': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 250000, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [128], 'lr': [0.0005], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': '/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_last_epochs': 1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': Namespace(_name='roberta_base', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_base', attention_dropout=0.1, azureml_logging=False, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=1, leave_unmasked_prob=0.1, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=10, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=10, mask_prob=0.5, mask_stdev=10.0, mask_whole_words=False, max_epoch=0, max_positions=6144, max_tokens=4096, max_tokens_valid=4096, max_update=250000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', save_interval=1, save_interval_updates=0, scoring='bleu', seed=5, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, suppress_crashes=False, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=3072, total_num_update='250000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[128], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=10000, weight_decay=0.01, zero_sharding='none'), 'task': Namespace(_name='masked_lm', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_base', attention_dropout=0.1, azureml_logging=False, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=1, leave_unmasked_prob=0.1, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=10, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=10, mask_prob=0.5, mask_stdev=10.0, mask_whole_words=False, max_epoch=0, max_positions=6144, max_tokens=4096, max_tokens_valid=4096, max_update=250000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', save_interval=1, save_interval_updates=0, scoring='bleu', seed=5, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, suppress_crashes=False, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=3072, total_num_update='250000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[128], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=10000, weight_decay=0.01, zero_sharding='none'), 'criterion': Namespace(_name='masked_lm', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_base', attention_dropout=0.1, azureml_logging=False, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=1, leave_unmasked_prob=0.1, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=10, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=10, mask_prob=0.5, mask_stdev=10.0, mask_whole_words=False, max_epoch=0, max_positions=6144, max_tokens=4096, max_tokens_valid=4096, max_update=250000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', save_interval=1, save_interval_updates=0, scoring='bleu', seed=5, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, suppress_crashes=False, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=3072, total_num_update='250000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[128], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=10000, weight_decay=0.01, zero_sharding='none'), 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.98)', 'adam_eps': 1e-06, 'weight_decay': 0.01, 'use_old_adam': False, 'tpu': False, 'lr': [0.0005]}, 'lr_scheduler': {'_name': 'polynomial_decay', 'warmup_updates': 10000, 'force_anneal': None, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 250000.0, 'lr': [0.0005]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None}
2021-02-12 21:49:30 | INFO | fairseq.tasks.masked_lm | dictionary: 56 types
2021-02-12 21:49:30 | INFO | fairseq.data.data_utils | loaded 2,534 examples from: /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data/valid
Traceback (most recent call last):
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_dataset.py", line 46, in __init__
    from fairseq.data.token_block_utils_fast import (
ModuleNotFoundError: No module named 'fairseq.data.token_block_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "fairseq_cli/train.py", line 453, in <module>
    cli_main()
  File "fairseq_cli/train.py", line 449, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 360, in call_main
    main(cfg, **kwargs)
  File "fairseq_cli/train.py", line 74, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=1)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/tasks/masked_lm.py", line 161, in load_dataset
    dataset = TokenBlockDataset(
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_dataset.py", line 51, in __init__
    raise ImportError(
ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`

cpark-dev avatar Feb 12 '21 21:02 cpark-dev

did you try following the instruction on the last line?

alexeib avatar Feb 13 '21 00:02 alexeib

I tried the second, python setup.py build_ext --inplace, because I installed it via pip. However, it failed.

$ python setup.py build_ext --inplace
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370131125/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/share/mini1/sw/std/cuda/cuda9.2/x86_64'
running build_ext
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:294: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
cythoning fairseq/data/data_utils_fast.pyx to fairseq/data/data_utils_fast.cpp
cythoning fairseq/data/token_block_utils_fast.pyx to fairseq/data/token_block_utils_fast.cpp
building 'fairseq.libbleu' extension
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libbleu/libbleu.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
[2/2] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libbleu/module.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/fairseq
g++ -pthread -shared -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -L/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,-rpath=/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,--no-as-needed -Wl,--sysroot=/ /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -o build/lib.linux-x86_64-3.8/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.data_utils_fast' extension
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/data_utils_fast.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=data_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
In file included from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/data_utils_fast.cpp:624:
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^
creating build/lib.linux-x86_64-3.8/fairseq/data
g++ -pthread -shared -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -L/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,-rpath=/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,--no-as-needed -Wl,--sysroot=/ /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.token_block_utils_fast' extension
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=token_block_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
In file included from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp:625:
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^
/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp: In function ‘PyArrayObject* __pyx_f_7fairseq_4data_22token_block_utils_fast__get_slice_indices_fast(PyArrayObject*, PyObject*, int, int, int)’:
/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp:3319:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       __pyx_t_4 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
                                      ^
/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp:3514:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       __pyx_t_3 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
                                      ^
g++ -pthread -shared -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -L/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,-rpath=/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,--no-as-needed -Wl,--sysroot=/ /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.libnat' extension
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/TH -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/THC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libnat/edit_dist.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o
c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/TH -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/THC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libnat/edit_dist.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
c++: error: unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build
    subprocess.run(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 257, in <module>
    do_setup(package_data)
  File "setup.py", line 168, in do_setup
    setup(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in build_extensions
    build_ext.build_extensions(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 491, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1250, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

cpark-dev avatar Feb 13 '21 01:02 cpark-dev

I resolve the compiling error. link

Then, I got the error below.

$ PYTHONPATH=. \
> python train.py --fp16 /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data \
>     --task masked_lm --criterion masked_lm \
>     --save-dir /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50 \
>     --keep-last-epochs 1 \
>     --train-subset train \
>     --arch roberta_base \
>     --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
>     --lr-scheduler polynomial_decay --lr 0.0005 --total-num-update 250000 --warmup-updates 10000 \
>     --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
>     --mask-multiple-length 10 --mask-prob 0.5 --mask-stdev 10 \
>     --sample-break-mode eos --tokens-per-sample 3072 --max-positions 6144 \
>     --max-tokens 4096 --update-freq 32 --max-update 250000 \
>     --seed 5 --log-format simple --log-interval 10 --skip-invalid-size-inputs-valid-test
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 3): tcp://localhost:18899
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 2): tcp://localhost:18899
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 0): tcp://localhost:18899
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 1): tcp://localhost:18899
Traceback (most recent call last):
  File "train.py", line 14, in <module>
    cli_main()
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq_cli/train.py", line 449, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 338, in call_main
    torch.multiprocessing.spawn(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 319, in distributed_main
    cfg.distributed_training.distributed_rank = distributed_init(cfg)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 258, in distributed_init
    dist.init_process_group(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370131125/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, internal error, NCCL version 2.7.8

cpark-dev avatar Feb 15 '21 23:02 cpark-dev

did you try following the instruction on the last line?

I tried. the last line worked for me.

bhaveshachhada avatar May 28 '23 18:05 bhaveshachhada