alphageometry icon indicating copy to clipboard operation
alphageometry copied to clipboard

Tests not passed

Open Jerry-Master opened this issue 1 year ago • 12 comments

I followed your instructions and got one test not passed. It says the following:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/array50tb/projects/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 82.530s

FAILED (failures=1)
****

What could be causing it?

Jerry-Master avatar Jan 18 '24 16:01 Jerry-Master

I also encountered the same problem.

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bytedance/repo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.162605, -1.1078743] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.162605
-1.1860729455947876

- [-1.162605, -1.1078743]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 37.484s

FAILED (failures=1)

jiakai0419 avatar Jan 18 '24 16:01 jiakai0419

Same error

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/data/username/imo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 126.464s

FAILED (failures=1)

Faultiness avatar Jan 19 '24 09:01 Faultiness

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (https://github.com/google-deepmind/alphageometry/commit/a8a1dc70818c1253b6524d761510a6ec6df39c07) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

thtrieu avatar Jan 20 '24 01:01 thtrieu

same here:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):   File "/Users/Documents/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad     self.assertEqual( AssertionError: Lists differ: [-1.1898218, -1.1082345] != [-1.1860729455947876, -1.1022869348526]   First differing element 0: -1.1898218 -1.1860729455947876  

  • [-1.1898218, -1.1082345]
  • [-1.1860729455947876, -1.1022869348526]  

Ran 2 tests in 82.937s

jackliugithub avatar Jan 20 '24 17:01 jackliugithub

same here

Traceback (most recent call last): File "/home/user/python_code/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad self.assertEqual( AssertionError: Lists differ: [-1.1563942, -1.1297226] != [-1.1860729455947876, -1.1022869348526]

First differing element 0: -1.1563942 -1.1860729455947876

  • [-1.1563942, -1.1297226]
  • [-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 62.007s

FAILED (failures=1)

yfcai1116 avatar Jan 29 '24 09:01 yfcai1116

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

soxziw avatar Jan 29 '24 10:01 soxziw

====================================================================== FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last): File "/home/notebook/code/personal/80306170/AGI/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad self.assertEqual( AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0: -1.1633697 -1.1860729455947876

  • [-1.1633697, -1.122621]
  • [-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.584s

FAILED (failures=1)

Ubuntu18.4、pytorch2.1-cu11.8、A100-80G

robotzheng avatar Feb 08 '24 06:02 robotzheng

Same here, the only test that does not pass when executing bash run_tests.sh is test_lm_score_may_fail_numerically_for_external_meliad.

My specific numbers:

AssertionError: Lists differ: [-1.1831452, -1.112445] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1831452
-1.1860729455947876

- [-1.1831452, -1.112445]
+ [-1.1860729455947876, -1.1022869348526]

My setup: Apple M1, macOS Ventura 13.6.1, Python 3.10.8, tensorflow 2.13.0

aemartinez avatar Feb 20 '24 19:02 aemartinez

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

Problems solved using Colab!

soxziw avatar Feb 20 '24 23:02 soxziw

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

faraday avatar Mar 06 '24 19:03 faraday

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

soxziw avatar Mar 09 '24 02:03 soxziw

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

Hello, have you changed to a different version of jax? I am unable to call TPU using the dependency library in requirements.txt. I don't know if this is due to Meliad's influence, which prevented me from using GPU or CPU to reproduce the results in the paper.

TriedTired99 avatar Apr 12 '24 17:04 TriedTired99