Constantin Dumitrascu

Results 50 comments of Constantin Dumitrascu

Closing this since a fix has been merged. Please reopen as necessary.

@joellliu , OLMo's AdamW optimizer is simply a code organization decision to group together PyTorch's AdamW optimizer and gradient clipping and metrics-collection into a single python module. Note that the...

@joellliu - we mainly rely on FSDP for this. We have done some work with the profiler to avoid host-device syncs and other stalls, making sure the GPUs stay busy...

@joellliu - the goal is to keep the GPU busy at all times (or as much as possible). I'm quoting below @dirkgr 's description below: _Basically, the way training works...

@juripapay - is there a traceback logged after the last line you pasted? I would expect it to log the traceback info, based on [this](https://github.com/allenai/OLMo/blob/main/olmo/util.py#L158).

@Jimmy-Yang1217 - could you please include the log before the error occurs? I'm curious when exactly the error is thrown. Thank you!

@bpwl0121 - thank you for the question. The two models (OLMo-7B and OLMo-7B-Twin-2T) are identical, except for differences in hardware and initialization. We showed that hardware isn't the cause of...

@bpwl0121 - that is correct that both models use the "mitchell" initialization method. The difference in initialization that I was referring to is the difference in values that the model...

@2015aroras - do you think that for 1.7 we should have two configs? I understand that training was stopped, config changed, then resumed.

Hi @codefly13 - all of it is already available in the dolma toolkit (i.e. this repo). Please let me know if you're looking for something different.