BugReporterZ

Results 16 comments of BugReporterZ

I stumbled upon this problem recently on OpenSUSE Tumbleweed and it was very annoying. However there are two simple workarounds: a temporary one and a permanent one (i.e. a possible...

I think I am observing the same issue. Compared to command-line fio, results barely change with KDiskMark by varying the number of threads in the benchmark. Using OpenSUSE Tumbleweed (rolling...

I can also see it being considerably non-chat mode than in chat mode and likely it's due to this initial delay strongly penalizing short replies, also mentioned by other users....

I also had the 0 tokens problem; after reconverting my weights (from the original torrent) with the latest `convert_llama_weights_to_hf.py`, now it seems to work correctly.

This might be due to the "group by length" option, try disabling it. ``` --group_by_length [GROUP_BY_LENGTH] Group sequences into batches with same length. Saves memory and speeds up training considerably....

It appears to group training examples in length-ordered chunks, and the longer training examples at the start of these chunks will show a higher loss. I also recall reading elsewhere...

I haven't investigated that in detail. I have always left that enabled because the eval loss curve didn't seem to be affected. You could refer to the Transformers documentation for...

Worth pointing out that the cat is out of the bag already: https://github.com/facebookresearch/llama/pull/73 Having an official source for the weights would make it safer to download these files; a proper...

If you configure the learning rate to be in the order of 1000 times smaller than with the 32bit version, on the long term the training loss appears to behave...