Motoki Wu

Results 9 comments of Motoki Wu

Probably this error? So I think update SpaCy. https://github.com/spacy-io/spaCy/issues/375

Not sure but the code probably won't work on newer TensorFlow. It should work on 0.5. > On May 6, 2016, at 12:18 AM, aewhatley [email protected] wrote: > > I...

Cool, thanks. There's been a few changes since 0.5 but don't have time to debug now. MT is tricky since it uses a softmax on a large vocab. The Shakespeare...

relevant issue in TensorFlow: https://github.com/tensorflow/tensorflow/issues/550

Hi @satpalsr , I've updated to DeepSpeed 0.8.2, but I'm getting the same results: ``` Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. [2023-03-17 06:03:09,070] [INFO] [logging.py:77:log_dist] [Rank -1] DeepSpeed info:...

Looks like version 0.9.4 works :) Closing. Guessing llama support fixed the gpt-neox type models: https://github.com/microsoft/DeepSpeed/pull/3425

Hi! It would be great if beam search works with DeepSpeed. I'm guessing it's probably the most common decoding algo. used in prod. are other generation strategies supported too? 1....

@mallorbc I've done some benchmarks using `gpt2` with `fp16` precision on my own data (of course ymmv). System info * cuda version 11.7 * A10G instance 24G * DeepSpeed 0.7.7...

> @tokestermw Thanks so much for sharing your insights! I assume to get these results you did something like a string compare for results generated with and without DeepSpeed? @mallorbc...