Thomas Wang
Thomas Wang
https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/a72225908e9bbda4d989bcdecd71c3c4a05a7f71/tools/convert_checkpoint/deepspeed_checkpoint.py#L5 seems wrong since the files generated using bf16 have `bf16_zero_pp_rank` as prefix.
Force sync layer norms
Script to reproduce diverging layer_norm weights
@DanielHesslow has opened a PR https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format. The...
- Related to: #209 Basically re-opening the PR as it seems to pass locally but not CI.
Up until recently we've been using pytorch code in order to apply "scale -> mask -> softmax" in attention mechanism for prefix LM. I've recently discovered that there exists two...
In this issue, we discuss how viable/interesting it might be to implement DeBERTa like attention mechanism: https://arxiv.org/abs/2006.03654 Things to take in account: - performance enhancements: Check with HF pretrained model...
When running `from promptsource.seqio_tasks import tasks` it takes a huge amount of time. One of the main reasons is this queries all dataset infos: https://github.com/bigscience-workshop/promptsource/blob/dba1d41e63a7af883fd7dc2727b4c7fd03e714c9/promptsource/seqio_tasks/tasks.py#L84 This is problematic for two...
**Describe the bug** When running the python script to install the lib, downloading of `freeimage` fails due to not being able to access: `kent.dl.sourceforge.net/project/freeimage/Source%20Distribution/3.18.0/FreeImage3180.zip` I'm guessing the url is not...