Dashiell Stander

Results 8 issues of Dashiell Stander

In particular these two requests work: https://api.pushshift.io/reddit/search/submission/?ids=bfghow,bfghno https://api.pushshift.io/reddit/search/submission/?ids=bfghow,bfghno&fields=title,id,score but these two do not: https://api.pushshift.io/reddit/search/submission/?ids=bfghow%2Cbfghno https://api.pushshift.io/reddit/search/submission/?ids=bfghow,bfghno&fields=title%2Cid%2Cscore Against the comments endpoint, the URL encoded commas work for the ids parameter: https://api.pushshift.io/reddit/search/comment/?ids=eldib61%2celdr6en&fields=body,id,score but...

Adds `session_params` parameter to `BotoAwsRequestsAuth` that gets unpacked into the botocore session object. This allows things like passing in regions, profiles, etc... that are outside the default way botocore picks...

For people using multiple different AWS profiles, it makes life much easier to be able to specify them easily programmatically.

SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency `triton==0.4.2`, which is behind the DeepSpeed version of `1.0.0`. It is far behind the version of Triton...

bug

Modifies the [BigScience checkpoint conversion scripts](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) to convert from NeoX DeeperSpeed to either the universal checkpoint format with `tools/ds_to_universal.py` or to a different DeepSpeed tensor/pipeline parallel format with `tools/deepspeed_to_deepspeed.py` When...

Hi! I'm working on problems with data defined on the n-sphere (n >= 128 or so). From looking around the documentation it seems like this library doesn't support that--do I...

Hey @crowsonkb , This is a quick PR that adds two files so that people can use [cog](https://github.com/replicate/cog) to easily try out the CFG diffusion model. It also can then...

# Description `convert_nanogpt_weights` had two issues: 1. It lacked the attention mask and the the `IGNORE` tensor. 2. It did not correctly handle the case where the nanogpt model was...