FAb
FAb
Seems like normal behavior to me. I have not tried the model you specified, but others of similar size like LLaMA-7B. If the temperature is not set very low, the...
Update: New LLaMA-based Pygmalion models have been released, they should have much better performance, since the LLaMA foundation models are trained on over 1T tokens and have state-of-the-art language understanding....
I am also interested. The extremely long context opens up new possibilities. I think this would be a really attractive feature to have.
Maybe it will be useful to combine the screen k-NN exploration reward with the coordinate-based exploration reward, so the latter doesn't trigger on menu navigation and therefore automatically disincentivizes abuse...