BugReporterZ comments

Results 26 comments of


                                            BugReporterZ

Steam prompts Network Manager without admin permissions

I stumbled upon this problem recently on OpenSUSE Tumbleweed and it was very annoying. However there are two simple workarounds: a temporary one and a permanent one (i.e. a possible...

All FIO threads spawn on the same cpu core

I think I am observing the same issue. Compared to command-line fio, results barely change with KDiskMark by varying the number of threads in the benchmark. Using OpenSUSE Tumbleweed (rolling...

Strange generation speed

I can also see it being considerably non-chat mode than in chat mode and likely it's due to this initial delay strongly penalizing short replies, also mentioned by other users....

LLaMA model does not work after migrating to the main transformers

I also had the 0 tokens problem; after reconverting my weights (from the original torrent) with the latest `convert_llama_weights_to_hf.py`, now it seems to work correctly.

Whenever I use QLoRA to train LLama/LLama 2 on an instruction-tuning dataset like Dolly or Alpaca I get a periodically oscillating training loss

This might be due to the "group by length" option, try disabling it. ``` --group_by_length [GROUP_BY_LENGTH] Group sequences into batches with same length. Saves memory and speeds up training considerably....

Whenever I use QLoRA to train LLama/LLama 2 on an instruction-tuning dataset like Dolly or Alpaca I get a periodically oscillating training loss

It appears to group training examples in length-ordered chunks, and the longer training examples at the start of these chunks will show a higher loss. I also recall reading elsewhere...

Whenever I use QLoRA to train LLama/LLama 2 on an instruction-tuning dataset like Dolly or Alpaca I get a periodically oscillating training loss

@ritabratamaiti I'm not aware of efforts in that regard, unfortunately.

Whenever I use QLoRA to train LLama/LLama 2 on an instruction-tuning dataset like Dolly or Alpaca I get a periodically oscillating training loss

I haven't investigated that in detail. I have always left that enabled because the eval loss curve didn't seem to be affected. You could refer to the Transformers documentation for...

Official LLaMA on HuggingFace anytime soon?

Worth pointing out that the cat is out of the bag already: https://github.com/facebookresearch/llama/pull/73 Having an official source for the weights would make it safer to download these files; a proper...

training/eval loss doesn't decrease when using paged_adamw_8bit

If you configure the learning rate to be in the order of 1000 times smaller than with the 32bit version, on the long term the training loss appears to behave...