camel icon indicating copy to clipboard operation
camel copied to clipboard

feat: Physics verifier

Open Ebony59 opened this issue 9 months ago • 5 comments

Description

Implement PhysicsVerifer that inherits PythonVerifier, which can deal with expressions, unit comparison and conversion in solution and reference answer.

Features:

  • Evaluate expressions if needed
  • Correctly parse and convert units
  • If solution unit doesn't match with ground truth unit, perform unit conversion to match them (e.g. 1km vs 1000m, 200000 vs 2*10^5)
  • Adjust relative tolerance when comparing the numerical results if needed (the default tol is 0.01, but sometimes the ground truth answer is given in lower tolerance, e.g. when ground truth is 1.3e+02, we should allow the answers to sit within a difference of 0.05e+02)

Ebony59 avatar Apr 07 '25 22:04 Ebony59

https://github.com/camel-ai/camel/pull/2133 there is float tolerance in the python verifier. Can we reuse it or make it modular for different verifiers @GitHoobar @Ebony59 @hallerite

lightaime avatar Apr 07 '25 23:04 lightaime

https://github.com/camel-ai/camel/pull/2133 there is float tolerance in the python verifier. Can we reuse it or make it modular for different verifiers @GitHoobar @Ebony59 @hallerite

Not sure whether that makes sense. PythonVerifier also does float matching for different python objects, like sets, lists and dicts. I don't see a lot of code duplication if it is implemented for each verifier. Should be mostly 1-2 lines of code.

old-hallerite avatar Apr 08 '25 01:04 old-hallerite

#2133 there is float tolerance in the python verifier. Can we reuse it or make it modular for different verifiers @GitHoobar @Ebony59 @hallerite

Not sure whether that makes sense. PythonVerifier also does float matching for different python objects, like sets, lists and dicts. I don't see a lot of code duplication if it is implemented for each verifier. Should be mostly 1-2 lines of code.

Not just for code duplication. It is also for maintenance. It is better to have only one abstraction dealing with float tolerance since it will be used in all different verifiers like python, math, physics, bio and so on. We don’t want to implement it all over the places

lightaime avatar Apr 08 '25 01:04 lightaime

#2133 there is float tolerance in the python verifier. Can we reuse it or make it modular for different verifiers @GitHoobar @Ebony59 @hallerite

Not sure whether that makes sense. PythonVerifier also does float matching for different python objects, like sets, lists and dicts. I don't see a lot of code duplication if it is implemented for each verifier. Should be mostly 1-2 lines of code.

Not just for code duplication. It is also for maintenance. It is better to have only one abstraction dealing with float tolerance since it will be used in all different verifiers like python, math, physics, bio and so on. We don’t want to implement it all over the places

In that case I would still implement it normally in the Python verifier and build one abstraction for the rest, because the Python verifier is really different.

old-hallerite avatar Apr 08 '25 11:04 old-hallerite

#2133 there is float tolerance in the python verifier. Can we reuse it or make it modular for different verifiers @GitHoobar @Ebony59 @hallerite

Not sure whether that makes sense. PythonVerifier also does float matching for different python objects, like sets, lists and dicts. I don't see a lot of code duplication if it is implemented for each verifier. Should be mostly 1-2 lines of code.

Not just for code duplication. It is also for maintenance. It is better to have only one abstraction dealing with float tolerance since it will be used in all different verifiers like python, math, physics, bio and so on. We don’t want to implement it all over the places

In that case I would still implement it normally in the Python verifier and build one abstraction for the rest, because the Python verifier is really different.

Maybe we can extract the comparison logic (float comparison and expression comparison) from PythonVerifier (make a PythonComparitor class or something like that), and these can be shared across different domains.

Ebony59 avatar Apr 09 '25 16:04 Ebony59