LMOps
LMOps copied to clipboard
General technology for enabling AI capabilities w/ LLMs and MLLMs
I got this error while doing Fine-tuning: File "/mnt/oss-data/xxx/minillm/transformers/src/transformers/generation/utils.py", line 3000, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 next_tokens =...
Hi, I'm using ZeRO with optimizer and parameter offload to run minillm on 2 H100 gpus on a single node. After doing the generation evaluation, I get a timeout during...
The processed data size is 55G. Are you sure about that size? Or can you provide processed SFT data link and pre-training data link separately? Thanks for open source. 🙏🏻
module: [EthosBinaryTask](https://github.com/microsoft/LMOps/blob/main/prompt_optimization/tasks.py) two questions: 1. df = df[(df[1] = 0.7)]: why using this condition to filter the data, the condition not appear in paper 2. exs = [{'id': x['index'], 'text':...
When the following code is executed, an error occurs ![Uploading 20231108-232040.jpeg…]() model, optimizer, _, lr_scheduler = deepspeed.initialize( model=model, optimizer=optimizer, args=args, lr_scheduler=lr_scheduler, mpu=None, config_params=ds_config )
Hello, I would like to ask Can I apply minillm to phoenix model. What code do I need to modify and how to modify it? Thanks for your help
``` causal_mask = self.bias[:, :, key_length - query_length : key_length, :key_length] ``` but in Structured Prompting the key_length exceeds the max_positions. How to address this issue. Thank you.
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to 3.0.3. Release notes Sourced from werkzeug's releases. 3.0.3 This is the Werkzeug 3.0.3 security release, which fixes security issues and bugs but does not otherwise...
Bumps [jinja2](https://github.com/pallets/jinja) from 2.11.3 to 3.1.4. Release notes Sourced from jinja2's releases. 3.1.4 This is the Jinja 3.1.4 security release, which fixes security issues and bugs but does not otherwise...
Is it Llama1 or Llama2? Thx