fastGPT
fastGPT copied to clipboard
An example where the current fast_tanh() gives different results
Using the 1558M model and the following input:
python encode_input.py \
"Alan Turing theorized that computers would one day become very powerful, but even he could not imagine" \
-n 100
I get the following output with tanh()
(equal to PyTorch
):
Output tokens:
703 484 561 307 973 13 198 198 1 40 836 470 892 314 1053 1683 1775 257 3644 326 714 466 1997 326 257 1692 852 714 466 553 339 531 13 198 198 1537 783 11 5176 284 262 670 286 257 1074 286 4837 379 262 2059 286 3442 11 14727 11 9061 389 852 973 284 466 1243 326 547 1752 1807 5340 13 198 198 464 1074 468 4166 257 3644 326 460 711 262 983 286 1514 11 257 3716 4811 983 326 9018 3867 5207 1088 257 3096 13 198 198 464 3644
Decoded output as text:
how they would be used.
"I don't think I've ever seen a computer that could do anything that a human being could do," he said.
But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.
The team has developed a computer that can play the game of Go, a complex strategy game that involves moving pieces around a board.
The computer
But with fast_tanh()
I get the following output:
Output tokens:
703 484 561 307 973 13 198 198 1 40 836 470 892 314 1053 1683 1775 257 3644 326 714 466 1997 326 257 1692 852 714 466 553 339 531 13 198 198 1537 783 11 5176 284 262 670 286 257 1074 286 4837 379 262 2059 286 3442 11 14727 11 9061 389 852 973 284 466 1243 326 547 1752 1807 5340 13 198 198 464 1074 468 4166 257 3644 326 460 711 262 983 286 1514 11 257 3716 4811 983 326 9018 3867 257 3704 1088 257 3096 284 8006 517 7674
Decoded output as text:
how they would be used.
"I don't think I've ever seen a computer that could do anything that a human being could do," he said.
But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.
The team has developed a computer that can play the game of Go, a complex strategy game that involves moving a piece around a board to capture more territory
The last 9 tokens are different.
So the exact numerical shape of the tanh
function makes a difference. At the very least from reproducibility perspective we have to maintain both versions. I don't know how to judge the quality, if the quality is the same, just slightly different probabilities that in the "greedy" mode give different results, but statistically equivalent.