Stas Bekman

Results 664 comments of Stas Bekman

> @stas00 one limitation in my mind of logging for such library code is that there are too many people that might want to read it (and so it is...

a meta question - why are we discussing a design outside of github? Won't you want the decision making process to be visible to all who subscribed to this PR?...

That sounds good, @mlazos - thank you for explaining why you felt it'd be more productive to use gdoc. Works for me.

I only have one last question in the doc. The rest looks good.

Also please include `warning_once` logger method to be available to developers. If you want to copy the one I created recently you can copy it from here: https://github.com/huggingface/transformers/blob/101a6cd276d454c6ab07aff3c54e598ff83d537c/src/transformers/utils/logging.py#L286-L298 e.g. at...

@adammoody, First, the Megatron part of this repo is outdated. The new repo is at https://github.com/microsoft/Megatron-DeepSpeed/ Perhaps the Deepspeed team could update this current repo to flag that this is...

Ideally all new development should go into: https://github.com/microsoft/Megatron-DeepSpeed/ and not DSE as DSE is very outdated. But I'm not a maintainer of these so it's up to the maintainers to...

Specifically to merge @adammoody's work into https://github.com/microsoft/Megatron-DeepSpeed/ do: ``` git clone https://github.com/microsoft/Megatron-DeepSpeed/ cd Megatron-DeepSpeed git remote add other https://github.com/bigscience-workshop/Megatron-DeepSpeed git fetch other git cherry-pick 5069622 git commit git push ```...

oh, I missed the fact that you added CL to `microsoft/Megatron-DeepSpeed` - awesome! Additionally you would want to sync https://github.com/microsoft/Megatron-DeepSpeed with the upstream https://github.com/NVIDIA/Megatron-LM since it's quite out of sync...

yes, it's the same object, it's like a cache: creation: https://github.com/huggingface/transformers/blob/b6865b9befad33f99adee0a6ef6361f72fcc8b42/src/transformers/activations.py#L206-L233 use: https://github.com/huggingface/transformers/blob/b6865b9befad33f99adee0a6ef6361f72fcc8b42/src/transformers/models/opt/modeling_opt.py#L288 The paradigm is shifting. Clearly there was no need to create a new object before because deepspeed...