Sam Foreman
Sam Foreman
Updates `deepspeed/monitor/monitor.py` to instantiate objects with correct configs Specifically, fixes issue when trying to use W&B ```Shell File "/soft/datascience/conda/2023-01-10/mconda3/lib/python3.10/site-packages/deepspeed/monitor/wandb.py", line 14, in __init__ self.group = wandb_config.group AttributeError: 'CSVConfig' object has...
Hello, When trying to enable W&B monitoring (as shown below in the snippet from my `ds_config.json`): ```json "wandb": { "enabled": True, "project": projectName, "group": groupName }, ``` I get the...
Not sure the cause, but trying to run multi-node training (launching with [mpich](https://www.mpich.org/)), I'm getting the following error: ```bash File "/lus/grand/projects/datascience/foremans/locations/polaris/projects/saforem2/Megatron-DeepSpeed/dist.py", line 106, in init_deepspeed deepspeed.init_distributed() File "/lus/grand/projects/datascience/foremans/locations/polaris/miniconda3/envs/2022-09-08-hvd-nccl/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 646,...
Explicitly: - Use [`Hydra`](https://hydra.cc) for all aspects of configuration - Modularize and move source code into `src/ngpt` - Add `pyproject.toml` - Add (Google Colab compatible) self-contained notebooks for training various...
Issue coming from: https://github.com/intel/intel-extension-for-pytorch/blob/a7f9edebd5fc102a7f290613987c380668d2a297/intel_extension_for_pytorch/__init__.py#L36 Trying: ```python >>> from intel_extension_for_transformers.transformers import ViTImageProcessor >>> processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k") /lus/gila/projects/Aurora_deployment/foremans/locations/sunspot/projects/saforem2/stormer-dev/venvs/sunspot/q4-drop/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image...