Thomas Wang issues

Results 28 issues of


                                            Thomas Wang

DeepSpeedCheckpoint needs to support bf16 optimizer states.

https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/a72225908e9bbda4d989bcdecd71c3c4a05a7f71/tools/convert_checkpoint/deepspeed_checkpoint.py#L5 seems wrong since the files generated using bf16 have `bf16_zero_pp_rank` as prefix.

Sync layer norm

Force sync layer norms

Test different layer norm

Script to reproduce diverging layer_norm weights

Make sure deepspeed powered models are equivalent with their non deepspeed version

@DanielHesslow has opened a PR https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format. The...

Good First Issue

Checking we use fused kernels to compute scaled masked softmax on prefix lm

- Related to: #209 Basically re-opening the PR as it seems to pass locally but not CI.

[prefixLM] Investigate cuda kernels

Up until recently we've been using pytorch code in order to apply "scale -> mask -> softmax" in attention mechanism for prefix LM. I've recently discovered that there exists two...

DeBERTa-like attention mechanism

In this issue, we discuss how viable/interesting it might be to implement DeBERTa like attention mechanism: https://arxiv.org/abs/2006.03654 Things to take in account: - performance enhancements: Check with HF pretrained model...

enhancement

arch&scale

Find a way to not load all the tasks infos.

When running `from promptsource.seqio_tasks import tasks` it takes a huge amount of time. One of the main reasons is this queries all dataset infos: https://github.com/bigscience-workshop/promptsource/blob/dba1d41e63a7af883fd7dc2727b4c7fd03e714c9/promptsource/seqio_tasks/tasks.py#L84 This is problematic for two...

[WIP] Support new accelerate api

Unreachable URLs in `scripts/python/build.py`

**Describe the bug** When running the python script to install the lib, downloading of `freeimage` fails due to not being able to access: `kent.dl.sourceforge.net/project/freeimage/Source%20Distribution/3.18.0/FreeImage3180.zip` I'm guessing the url is not...