DeepSpeed DeepSpeed initialization with GNN-like model

My code is quite similar to some GNN structure : NN_output = graph.forward(NN_input, types="f")

So, outputs = model_engine(inputs) seems does not really fit in my case ? args also does not follow such code styling.

Any idea ?

Jun 18 '22 19:06 buttercutter

I did some coding modifications, however I could not initialize deepspeed properly.

/home/phung/PycharmProjects/venv/py39/bin/python /home/phung/PycharmProjects/beginner_tutorial/gdas.py
Files already downloaded and verified
Files already downloaded and verified
[2022-07-13 17:00:25,770] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
[2022-07-13 17:00:25,782] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2022-07-13 17:00:27,782] [INFO] [distributed.py:85:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=archlinux, master_port=29500
[2022-07-13 17:00:27,782] [INFO] [distributed.py:48:init_distributed] Initializing torch distributed with backend: nccl
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 936, in <module>
    model_engine_, optimizer, trainloader, __ = deepspeed.initialize(args=args_, model=graph_,
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/__init__.py", line 120, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 238, in __init__
    self._do_args_sanity_check(args)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 900, in _do_args_sanity_check
    assert (
AssertionError: DeepSpeed requires --deepspeed_config to specify configuration file

Process finished with exit code 1

Jul 13 '22 09:07 buttercutter

@buttercutter, you are missing a deepspeed config file on the command passed by --deepspeed_config.

Alternatively, you can pass a dict as config_params to deepspeed.initialize()

Jul 13 '22 10:07 tjruwase

Do you have a recommended deepspeed configuration file ?

Note: The deepspeed configuration for training transformer-like network structure might be different from that for GNN-like network structure.

Jul 13 '22 11:07 buttercutter

If I use the above configuration file from HuggingFace, I have the following error:

model_engine_, optimizer, trainloader, __ = deepspeed.initialize(args=args_, model=graph_, model_parameters=parameters, training_data=trainset, config_params='./ds_config.json')

/home/phung/PycharmProjects/venv/py39/bin/python /home/phung/PycharmProjects/beginner_tutorial/gdas.py
Files already downloaded and verified
Files already downloaded and verified
[2022-07-13 19:10:10,635] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
[2022-07-13 19:10:10,648] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2022-07-13 19:10:12,517] [INFO] [distributed.py:85:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=archlinux, master_port=29500
[2022-07-13 19:10:12,517] [INFO] [distributed.py:48:init_distributed] Initializing torch distributed with backend: nccl
Traceback (most recent call last):
  File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 936, in <module>
    model_engine_, optimizer, trainloader, __ = deepspeed.initialize(args=args_, model=graph_,
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/__init__.py", line 120, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 239, in __init__
    self._configure_with_arguments(args, mpu)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 872, in _configure_with_arguments
    self._config = DeepSpeedConfig(self.config, mpu)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 874, in __init__
    self._initialize_params(self._param_dict)
  File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 903, in _initialize_params
    assert not (self.fp16_enabled and self.bfloat16_enabled), 'bfloat16 and fp16 modes cannot be simultaneously enabled'
AssertionError: bfloat16 and fp16 modes cannot be simultaneously enabled

Process finished with exit code 1

Besides, the IDE software also complains on the following two issues.

Cannot find reference 'parse_args' in 'parser.pyi' at line 917

Expected type 'Optional[Module]', got 'filter[Parameter]' instead at line 939

Jul 13 '22 11:07 buttercutter

DeepSpeed configuration is meant to be network-agnostic, so in reality that configuration file would work except for auto fields which are defined for the HF frontend. The configuration file is used to enable/disable different features of the DeepSpeed framework, rather than to specify or control network properties. You can start with a minimal configuration file that defines just micro_batch_size, optimizer, and logging like below:

{
 "train_micro_batch_size_per_gpu": 1,
"steps_per_print": 1, 
 "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": <add your learning rate>
        }
    }
}

You can progressively add more configuration knobs as you get more familiar with DeepSpeed.

Jul 13 '22 11:07 tjruwase

I have the following runtime error on conflicting batch_size values ?

ValueError: Expected input batch_size (8) to match target batch_size (1).

Files already downloaded and verified
Files already downloaded and verified
[2022-07-13 13:15:18,174] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
[2022-07-13 13:15:18,188] [INFO] [distributed.py:37:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2022-07-13 13:15:18,635] [INFO] [distributed.py:91:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=172.28.0.2, master_port=29500
[2022-07-13 13:15:18,635] [INFO] [distributed.py:49:init_distributed] Initializing torch distributed with backend: nccl
[2022-07-13 13:15:18,765] [INFO] [engine.py:279:__init__] DeepSpeed Flops Profiler Enabled: False
Installed CUDA version 11.1 does not match the version torch was compiled with 11.3 but since the APIs are compatible, accepting this combination
Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py37_cu113/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -std=c++14 -c /usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/usr/local/lib/python3.7/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so
Loading extension module fused_adam...
Time to load fused_adam op: 31.784398078918457 seconds
[2022-07-13 13:15:51,799] [INFO] [engine.py:1102:_configure_optimizer] Using DeepSpeed Optimizer param name adamw as basic optimizer
[2022-07-13 13:15:52,015] [INFO] [engine.py:1109:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2022-07-13 13:15:52,015] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw
[2022-07-13 13:15:52,016] [INFO] [engine.py:795:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2022-07-13 13:15:52,016] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2022-07-13 13:15:52,016] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.05], mom=[(0.9, 0.999)]
[2022-07-13 13:15:52,020] [INFO] [config.py:1059:print] DeepSpeedEngine configuration:
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   amp_enabled .................. False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   amp_params ................... False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": null, 
    "exps_dir": null, 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   bfloat16_enabled ............. False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   checkpoint_tag_validation_enabled  True
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   checkpoint_tag_validation_fail  False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   communication_data_type ...... None
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   curriculum_enabled ........... False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print]   curriculum_params ............ False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   dataloader_drop_last ......... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   disable_allgather ............ False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   dump_state ................... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   dynamic_loss_scale_args ...... None
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_enabled ........... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_gas_boundary_resolution  1
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_layer_num ......... 0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_max_iter .......... 100
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_stability ......... 1e-06
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_tol ............... 0.01
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   eigenvalue_verbose ........... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   elasticity_enabled ........... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   fp16_enabled ................. False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   fp16_master_weights_and_gradients  False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   fp16_mixed_quantize .......... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   global_rank .................. 0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   gradient_accumulation_steps .. 1
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   gradient_clipping ............ 0.0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   gradient_predivide_factor .... 1.0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   initial_dynamic_scale ........ 4294967296
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   loss_scale ................... 0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print]   memory_breakdown ............. False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   optimizer_legacy_fusion ...... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   optimizer_name ............... adamw
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   optimizer_params ............. {'lr': 0.05}
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   pld_enabled .................. False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   pld_params ................... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   prescale_gradients ........... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_change_rate ......... 0.001
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_groups .............. 1
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_offset .............. 1000
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_period .............. 1000
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_rounding ............ 0
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_start_bits .......... 16
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_target_bits ......... 8
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_training_enabled .... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_type ................ 0
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   quantize_verbose ............. False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   scheduler_name ............... None
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   scheduler_params ............. None
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   sparse_attention ............. None
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   sparse_gradients_enabled ..... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   steps_per_print .............. 1
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   tensorboard_enabled .......... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   tensorboard_job_name ......... DeepSpeedJobName
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   tensorboard_output_path ...... 
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print]   train_batch_size ............. 1
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   train_micro_batch_size_per_gpu  1
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   use_quantizer_kernel ......... False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   wall_clock_breakdown ......... False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   world_size ................... 1
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   zero_allow_untested_optimizer  False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   zero_config .................. {
    "stage": 0, 
    "contiguous_gradients": true, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": false, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_16bit_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   zero_enabled ................. False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print]   zero_optimization_stage ...... 0
[2022-07-13 13:15:52,024] [INFO] [config.py:1071:print]   json = {
    "train_micro_batch_size_per_gpu": 1, 
    "steps_per_print": 1, 
    "optimizer": {
        "type": "AdamW", 
        "params": {
            "lr": 0.05
        }
    }
}
Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py37_cu113/utils...
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o 
[2/2] c++ flatten_unflatten.o -shared -L/usr/local/lib/python3.7/dist-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so
Loading extension module utils...
Time to load utils op: 16.06555199623108 seconds
run_num =  0
Traceback (most recent call last):
  File "gdas.py", line 947, in <module>
    ltrain = train_NN(graph=graph_, model_engine=model_engine_, forward_pass_only=0)
  File "gdas.py", line 690, in train_NN
    Ltrain = criterion(NN_output, NN_train_labels)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py", line 1166, in forward
    label_smoothing=self.label_smoothing)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (8) to match target batch_size (1).
[85b173f58da1:00656] *** Process received signal ***
[85b173f58da1:00656] Signal: Segmentation fault (11)
[85b173f58da1:00656] Signal code: Address not mapped (1)
[85b173f58da1:00656] Failing at address: 0x7f751665320d
[85b173f58da1:00656] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f75192fd980]
[85b173f58da1:00656] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7f7518f3c775]
[85b173f58da1:00656] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7f75197a7e44]
[85b173f58da1:00656] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7f7518f3d605]
[85b173f58da1:00656] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7f75197a5cb3]
[85b173f58da1:00656] *** End of error message ***

Jul 13 '22 13:07 buttercutter

Set "train_micro_batch_size_per_gpu" to 8 in the configuration file.

Jul 13 '22 14:07 tjruwase

May I ask if retain_graph=True is fully supported now ?

Jul 13 '22 17:07 buttercutter

It should be, but please report any issues.

Jul 13 '22 18:07 tjruwase

model_engine.backward(Ltrain, retain_graph=True) gave the following error ?

Traceback (most recent call last):
  File "gdas.py", line 947, in <module>
    ltrain = train_NN(graph=graph_, model_engine=model_engine_, forward_pass_only=0)
  File "gdas.py", line 700, in train_NN
    model_engine.backward(Ltrain, retain_graph=True)
  File "/usr/local/lib/python3.7/dist-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    return func(*args, **kwargs)
TypeError: backward() got an unexpected keyword argument 'retain_graph'

Jul 14 '22 00:07 buttercutter

@tjruwase May I know why retain_graph still does not work for me ?

Jul 16 '22 03:07 buttercutter

Sorry, it appears #1149 was never merged. Unfortunately, it has a conflict with master. Can you please try picking that up?

Jul 16 '22 21:07 tjruwase

@buttercutter, #1149 is now merged. Please try master.

Jul 30 '22 17:07 tjruwase

@tjruwase

Why Expected type 'Module | None', got 'filter[Parameter]' instead error for model_parameter ?

Aug 27 '22 01:08 buttercutter

This is a type error. Please see doc for deepspeed.initialize().

Aug 28 '22 02:08 tjruwase

The same code works perfectly fine within google colab GPU cloud environment.

So, I guess this above type error is due to local installation issue.

However, deepspeed still give RuntimeError: CUDA out of memory. Could you advise what could have gone wrong ?

Aug 28 '22 14:08 buttercutter

The same code works perfectly fine within google colab GPU cloud environment.

So, I guess this above type error is due to local installation issue.

This is quite strange. It would be good to figure out what is different about the local and colab installations. Do you mind printing out the types of every parameter passed to deepspeed.initialize()?

Aug 29 '22 11:08 tjruwase

Exception: Installed CUDA version 11.7 does not match the version torch was compiled with 10.2, unable to compile cuda/cpp extensions without a matching cuda version.

Local installation seems to have failed with some CUDA and torch version incompatibility.

The following is the output for online google colab GPU cloud environment.

print("type(args) = ", type(args_))
print("type(graph_) = ", type(graph_))
print("type(parameters) = ", type(parameters))
print("type(trainset) = ", type(trainset))

type(args) =  <class 'argparse.Namespace'>
type(graph_) =  <class '__main__.Graph'>
type(parameters) =  <class 'filter'>
type(trainset) =  <class 'torchvision.datasets.cifar.CIFAR10'>

Sep 03 '22 02:09 buttercutter

@tjruwase I see no issue with the initialization coding at least within the working online google colab GPU cloud environment.

Shall I open up a different github issue since this is an entirely different problem ?

Sep 05 '22 15:09 buttercutter

@buttercutter, yes, please open a new issue. Thanks!

Sep 05 '22 22:09 tjruwase

DeepSpeed DeepSpeed copied to clipboard

DeepSpeed initialization with GNN-like model

DeepSpeed
DeepSpeed copied to clipboard