DeepSpeed
DeepSpeed copied to clipboard
DeepSpeed initialization with GNN-like model
My code is quite similar to some GNN structure : NN_output = graph.forward(NN_input, types="f")
So, outputs = model_engine(inputs) seems does not really fit in my case ? args
also does not follow such code styling.
Any idea ?
I did some coding modifications, however I could not initialize deepspeed properly.
/home/phung/PycharmProjects/venv/py39/bin/python /home/phung/PycharmProjects/beginner_tutorial/gdas.py
Files already downloaded and verified
Files already downloaded and verified
[2022-07-13 17:00:25,770] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
[2022-07-13 17:00:25,782] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2022-07-13 17:00:27,782] [INFO] [distributed.py:85:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=archlinux, master_port=29500
[2022-07-13 17:00:27,782] [INFO] [distributed.py:48:init_distributed] Initializing torch distributed with backend: nccl
Traceback (most recent call last):
File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 936, in <module>
model_engine_, optimizer, trainloader, __ = deepspeed.initialize(args=args_, model=graph_,
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/__init__.py", line 120, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 238, in __init__
self._do_args_sanity_check(args)
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 900, in _do_args_sanity_check
assert (
AssertionError: DeepSpeed requires --deepspeed_config to specify configuration file
Process finished with exit code 1
@buttercutter, you are missing a deepspeed config file on the command passed by --deepspeed_config
.
Alternatively, you can pass a dict as config_params to deepspeed.initialize()
Do you have a recommended deepspeed configuration file ?
Note: The deepspeed configuration for training transformer-like network structure might be different from that for GNN-like network structure.
If I use the above configuration file from HuggingFace, I have the following error:
model_engine_, optimizer, trainloader, __ = deepspeed.initialize(args=args_, model=graph_, model_parameters=parameters, training_data=trainset, config_params='./ds_config.json')
/home/phung/PycharmProjects/venv/py39/bin/python /home/phung/PycharmProjects/beginner_tutorial/gdas.py
Files already downloaded and verified
Files already downloaded and verified
[2022-07-13 19:10:10,635] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
[2022-07-13 19:10:10,648] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2022-07-13 19:10:12,517] [INFO] [distributed.py:85:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=archlinux, master_port=29500
[2022-07-13 19:10:12,517] [INFO] [distributed.py:48:init_distributed] Initializing torch distributed with backend: nccl
Traceback (most recent call last):
File "/home/phung/PycharmProjects/beginner_tutorial/gdas.py", line 936, in <module>
model_engine_, optimizer, trainloader, __ = deepspeed.initialize(args=args_, model=graph_,
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/__init__.py", line 120, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 239, in __init__
self._configure_with_arguments(args, mpu)
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 872, in _configure_with_arguments
self._config = DeepSpeedConfig(self.config, mpu)
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 874, in __init__
self._initialize_params(self._param_dict)
File "/home/phung/PycharmProjects/venv/py39/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 903, in _initialize_params
assert not (self.fp16_enabled and self.bfloat16_enabled), 'bfloat16 and fp16 modes cannot be simultaneously enabled'
AssertionError: bfloat16 and fp16 modes cannot be simultaneously enabled
Process finished with exit code 1
Besides, the IDE software also complains on the following two issues.
Cannot find reference 'parse_args' in 'parser.pyi'
at line 917
Expected type 'Optional[Module]', got 'filter[Parameter]' instead
at line 939
DeepSpeed configuration is meant to be network-agnostic, so in reality that configuration file would work except for auto
fields which are defined for the HF frontend. The configuration file is used to enable/disable different features of the DeepSpeed framework, rather than to specify or control network properties. You can start with a minimal configuration file that defines just micro_batch_size, optimizer, and logging like below:
{
"train_micro_batch_size_per_gpu": 1,
"steps_per_print": 1,
"optimizer": {
"type": "AdamW",
"params": {
"lr": <add your learning rate>
}
}
}
You can progressively add more configuration knobs as you get more familiar with DeepSpeed.
I have the following runtime error on conflicting batch_size
values ?
ValueError: Expected input batch_size (8) to match target batch_size (1).
Files already downloaded and verified
Files already downloaded and verified
[2022-07-13 13:15:18,174] [INFO] [logging.py:69:log_dist] [Rank -1] DeepSpeed info: version=0.6.5, git-hash=unknown, git-branch=unknown
[2022-07-13 13:15:18,188] [INFO] [distributed.py:37:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2022-07-13 13:15:18,635] [INFO] [distributed.py:91:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=172.28.0.2, master_port=29500
[2022-07-13 13:15:18,635] [INFO] [distributed.py:49:init_distributed] Initializing torch distributed with backend: nccl
[2022-07-13 13:15:18,765] [INFO] [engine.py:279:__init__] DeepSpeed Flops Profiler Enabled: False
Installed CUDA version 11.1 does not match the version torch was compiled with 11.3 but since the APIs are compatible, accepting this combination
Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py37_cu113/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -std=c++14 -c /usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/usr/local/lib/python3.7/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so
Loading extension module fused_adam...
Time to load fused_adam op: 31.784398078918457 seconds
[2022-07-13 13:15:51,799] [INFO] [engine.py:1102:_configure_optimizer] Using DeepSpeed Optimizer param name adamw as basic optimizer
[2022-07-13 13:15:52,015] [INFO] [engine.py:1109:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2022-07-13 13:15:52,015] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw
[2022-07-13 13:15:52,016] [INFO] [engine.py:795:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2022-07-13 13:15:52,016] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2022-07-13 13:15:52,016] [INFO] [logging.py:69:log_dist] [Rank 0] step=0, skipped=0, lr=[0.05], mom=[(0.9, 0.999)]
[2022-07-13 13:15:52,020] [INFO] [config.py:1059:print] DeepSpeedEngine configuration:
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] amp_enabled .................. False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] amp_params ................... False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": null,
"exps_dir": null,
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] bfloat16_enabled ............. False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] checkpoint_tag_validation_enabled True
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] checkpoint_tag_validation_fail False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] communication_data_type ...... None
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] curriculum_enabled ........... False
[2022-07-13 13:15:52,021] [INFO] [config.py:1063:print] curriculum_params ............ False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] dataloader_drop_last ......... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] disable_allgather ............ False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] dump_state ................... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] dynamic_loss_scale_args ...... None
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_enabled ........... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_gas_boundary_resolution 1
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_layer_name ........ bert.encoder.layer
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_layer_num ......... 0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_max_iter .......... 100
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_stability ......... 1e-06
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_tol ............... 0.01
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] eigenvalue_verbose ........... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] elasticity_enabled ........... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] flops_profiler_config ........ {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] fp16_enabled ................. False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] fp16_master_weights_and_gradients False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] fp16_mixed_quantize .......... False
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] global_rank .................. 0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] gradient_accumulation_steps .. 1
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] gradient_clipping ............ 0.0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] gradient_predivide_factor .... 1.0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] initial_dynamic_scale ........ 4294967296
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] loss_scale ................... 0
[2022-07-13 13:15:52,022] [INFO] [config.py:1063:print] memory_breakdown ............. False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] optimizer_legacy_fusion ...... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] optimizer_name ............... adamw
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] optimizer_params ............. {'lr': 0.05}
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] pld_enabled .................. False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] pld_params ................... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] prescale_gradients ........... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_change_rate ......... 0.001
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_groups .............. 1
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_offset .............. 1000
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_period .............. 1000
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_rounding ............ 0
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_start_bits .......... 16
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_target_bits ......... 8
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_training_enabled .... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_type ................ 0
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] quantize_verbose ............. False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] scheduler_name ............... None
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] scheduler_params ............. None
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] sparse_attention ............. None
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] sparse_gradients_enabled ..... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] steps_per_print .............. 1
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] tensorboard_enabled .......... False
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] tensorboard_job_name ......... DeepSpeedJobName
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] tensorboard_output_path ......
[2022-07-13 13:15:52,023] [INFO] [config.py:1063:print] train_batch_size ............. 1
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] train_micro_batch_size_per_gpu 1
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] use_quantizer_kernel ......... False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] wall_clock_breakdown ......... False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] world_size ................... 1
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] zero_allow_untested_optimizer False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] zero_config .................. {
"stage": 0,
"contiguous_gradients": true,
"reduce_scatter": true,
"reduce_bucket_size": 5.000000e+08,
"allgather_partitions": true,
"allgather_bucket_size": 5.000000e+08,
"overlap_comm": false,
"load_from_fp32_weights": true,
"elastic_checkpoint": false,
"offload_param": null,
"offload_optimizer": null,
"sub_group_size": 1.000000e+09,
"prefetch_bucket_size": 5.000000e+07,
"param_persistence_threshold": 1.000000e+05,
"max_live_parameters": 1.000000e+09,
"max_reuse_distance": 1.000000e+09,
"gather_16bit_weights_on_model_save": false,
"ignore_unused_parameters": true,
"round_robin_gradients": false,
"legacy_stage1": false
}
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] zero_enabled ................. False
[2022-07-13 13:15:52,024] [INFO] [config.py:1063:print] zero_optimization_stage ...... 0
[2022-07-13 13:15:52,024] [INFO] [config.py:1071:print] json = {
"train_micro_batch_size_per_gpu": 1,
"steps_per_print": 1,
"optimizer": {
"type": "AdamW",
"params": {
"lr": 0.05
}
}
}
Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py37_cu113/utils...
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /usr/local/lib/python3.7/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
[2/2] c++ flatten_unflatten.o -shared -L/usr/local/lib/python3.7/dist-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so
Loading extension module utils...
Time to load utils op: 16.06555199623108 seconds
run_num = 0
Traceback (most recent call last):
File "gdas.py", line 947, in <module>
ltrain = train_NN(graph=graph_, model_engine=model_engine_, forward_pass_only=0)
File "gdas.py", line 690, in train_NN
Ltrain = criterion(NN_output, NN_train_labels)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py", line 1166, in forward
label_smoothing=self.label_smoothing)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 3014, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (8) to match target batch_size (1).
[85b173f58da1:00656] *** Process received signal ***
[85b173f58da1:00656] Signal: Segmentation fault (11)
[85b173f58da1:00656] Signal code: Address not mapped (1)
[85b173f58da1:00656] Failing at address: 0x7f751665320d
[85b173f58da1:00656] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f75192fd980]
[85b173f58da1:00656] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7f7518f3c775]
[85b173f58da1:00656] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7f75197a7e44]
[85b173f58da1:00656] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7f7518f3d605]
[85b173f58da1:00656] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7f75197a5cb3]
[85b173f58da1:00656] *** End of error message ***
Set "train_micro_batch_size_per_gpu" to 8 in the configuration file.
May I ask if retain_graph=True is fully supported now ?
It should be, but please report any issues.
model_engine.backward(Ltrain, retain_graph=True)
gave the following error ?
Traceback (most recent call last):
File "gdas.py", line 947, in <module>
ltrain = train_NN(graph=graph_, model_engine=model_engine_, forward_pass_only=0)
File "gdas.py", line 700, in train_NN
model_engine.backward(Ltrain, retain_graph=True)
File "/usr/local/lib/python3.7/dist-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
TypeError: backward() got an unexpected keyword argument 'retain_graph'
@tjruwase May I know why retain_graph
still does not work for me ?
Sorry, it appears #1149 was never merged. Unfortunately, it has a conflict with master. Can you please try picking that up?
@buttercutter, #1149 is now merged. Please try master.
@tjruwase
Why Expected type 'Module | None', got 'filter[Parameter]' instead
error for model_parameter ?
This is a type error. Please see doc for deepspeed.initialize()
.
The same code works perfectly fine within google colab GPU cloud environment.
So, I guess this above type error is due to local installation issue.
However, deepspeed still give RuntimeError: CUDA out of memory
. Could you advise what could have gone wrong ?
The same code works perfectly fine within google colab GPU cloud environment.
So, I guess this above type error is due to local installation issue.
This is quite strange. It would be good to figure out what is different about the local and colab installations. Do you mind printing out the types of every parameter passed to deepspeed.initialize()
?
Exception: Installed CUDA version 11.7 does not match the version torch was compiled with 10.2, unable to compile cuda/cpp extensions without a matching cuda version.
Local installation seems to have failed with some CUDA and torch version incompatibility.
The following is the output for online google colab GPU cloud environment.
print("type(args) = ", type(args_))
print("type(graph_) = ", type(graph_))
print("type(parameters) = ", type(parameters))
print("type(trainset) = ", type(trainset))
type(args) = <class 'argparse.Namespace'>
type(graph_) = <class '__main__.Graph'>
type(parameters) = <class 'filter'>
type(trainset) = <class 'torchvision.datasets.cifar.CIFAR10'>
@tjruwase I see no issue with the initialization coding at least within the working online google colab GPU cloud environment.
Shall I open up a different github issue since this is an entirely different problem ?
@buttercutter, yes, please open a new issue. Thanks!