fastGPT An example where the current fast_tanh() gives different results

An example where the current fast_tanh() gives different results

Open certik opened this issue 1 year ago • 0 comments

Using the 1558M model and the following input:

python encode_input.py \
        "Alan Turing theorized that computers would one day become very powerful, but even he could not imagine" \
        -n 100

I get the following output with tanh() (equal to PyTorch):

Output tokens:
   703   484   561   307   973    13   198   198     1    40   836   470   892   314  1053  1683  1775   257  3644   326   714   466  1997   326   257  1692   852   714   466   553   339   531    13   198   198  1537   783    11  5176   284   262   670   286   257  1074   286  4837   379   262  2059   286  3442    11 14727    11  9061   389   852   973   284   466  1243   326   547  1752  1807  5340    13   198   198   464  1074   468  4166   257  3644   326   460   711   262   983   286  1514    11   257  3716  4811   983   326  9018  3867  5207  1088   257  3096    13   198   198   464  3644
Decoded output as text:
 how they would be used.

"I don't think I've ever seen a computer that could do anything that a human being could do," he said.

But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.

The team has developed a computer that can play the game of Go, a complex strategy game that involves moving pieces around a board.

The computer

But with fast_tanh() I get the following output:

Output tokens:
   703   484   561   307   973    13   198   198     1    40   836   470   892   314  1053  1683  1775   257  3644   326   714   466  1997   326   257  1692   852   714   466   553   339   531    13   198   198  1537   783    11  5176   284   262   670   286   257  1074   286  4837   379   262  2059   286  3442    11 14727    11  9061   389   852   973   284   466  1243   326   547  1752  1807  5340    13   198   198   464  1074   468  4166   257  3644   326   460   711   262   983   286  1514    11   257  3716  4811   983   326  9018  3867   257  3704  1088   257  3096   284  8006   517  7674
Decoded output as text:
 how they would be used.

"I don't think I've ever seen a computer that could do anything that a human being could do," he said.

But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.

The team has developed a computer that can play the game of Go, a complex strategy game that involves moving a piece around a board to capture more territory

The last 9 tokens are different.

So the exact numerical shape of the tanh function makes a difference. At the very least from reproducibility perspective we have to maintain both versions. I don't know how to judge the quality, if the quality is the same, just slightly different probabilities that in the "greedy" mode give different results, but statistically equivalent.

Mar 15 '23 18:03 certik

fastGPT fastGPT copied to clipboard

An example where the current fast_tanh() gives different results

fastGPT
fastGPT copied to clipboard