apex icon indicating copy to clipboard operation
apex copied to clipboard

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Results 296 apex issues
Sort by recently updated
recently updated
newest added

After creating a conda environment by following the instructions from BigScience [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/start_fast.md), i.e.: 1) `conda create -n bloom python=3.9` 2) `conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch ` (my...

bug

My configuration is as follows: win10 pytorch1.2.0 cuda10.0 vs2017 The installation command is as follows: python setup.py install --cuda_ext --cpp_ext Please tell me how to solve this problem, thank you!...

See https://github.com/facebookresearch/pytorch3d/issues/1127#issuecomment-1071206320 This causes nvcc trying to deal with pybind, which fails with > error: too few arguments for template template parameter "Tuple"

Using apex for mix precision training, and find one case when `if x.requires_grad and cached_x.requires_grad:`, the tuple `cached_x.grad_fn.next_functions` contains only one element. In this case, we see the error: ```python...

Hello,everyone, When I train a Siamese Network (such as bi-encoder) with APEX accelerations, the codes will throw an exceptions. A possible reason is that both inputs are encoded with the...

Added bfloat16 dispatch to kernels used in the FusedMixedPrecisionLamb optimizer.

Dose Apex support Transformer or Vision Transformer considering the existence of Layer Norm layers and statistics synchronous across GPUs?

bug

https://github.com/NVIDIA/apex/blob/a0f5f3ac0f6bf39feee6e60eee66ec873dc299ab/apex/transformer/pipeline_parallel/p2p_communication.py#L271 might be able to be removed after confirming https://github.com/pytorch/pytorch/pull/82450

Does apex have support for training with Sparse Tensor Core in PyTorch? If so, from which version of the release is this feature included? Thank you!