Results 32 comments of Sam Havens

@casperbh96 @abhinavkulkarni We are working on a PR which adds support for `output_attentions` when using `torch` attention; #210 For supporting `device_map="auto"`, I believe the only change we need is to...

Just chiming in that I am already using this branch for testing the chat model!

Tests are failing with ``` ___________________ ERROR collecting tests/test_training.py ____________________ ImportError while importing test module '/llm-foundry/tests/test_training.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: llmfoundry/__init__.py:8: in from...

I believe that AWS instance has 4xT4 ~= 64GB of VRAM. You want at least twice that. Also, this stack is mostly tested on AA100s and there have been reports...

Do you get an OOM with the A10s as well, or a different error?

To echo what @pcavanaugh said, to get WOW working with Browserify, I had to download the AMD branch, then change a few `this`s to `window` to get it to work.

I think having this option is good, some users almost certainly want it. However, I think this should be optional, as I am not convinced it shouldn't learn to predict...

@vchiley for models which have both EOS and BOS, are you saying don't learn that BOS comes after EOS? it isn't worth learning, true, but also... we'll always stop generating...

as discussed on Slack, I think that: * EOS is effectively a BOS token, and so we want P(t|EOS) to be different than P(t), so we don't want to mask...