OpenNARS-for-Applications icon indicating copy to clipboard operation
OpenNARS-for-Applications copied to clipboard

The high performance in the `bandrobot` test may be accidental

Open ARCJ137442 opened this issue 1 year ago • 2 comments

Background

The bandrobot test, which is one of a demo in ONA, is aiming to test the multistep event inferencing/subgoaling of ONA reasoner (by NAL-7 & NAL-8 temporal/procedural inferencing)

The scene generated by ASCII art is like:

+++++++++++++++++++++|
---------------------|
            A        |
 o                   |
'''U'''''''''''''''''|

This is a singleplayer game and the main goal is to controll the robot A, pick the ball o and drop it into the bucket U. In this game, ONA is expected to learn the procedual knowledge from comparing frequency of beliefs (corresponding to the relative position between the robot and the ball/bucket), which is logical represented by inference rule { <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (+ P)>(t_frequency_greater) and { <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (= P)>(t_frequency_equal). Using the representation of relative position, ONA is able to learn "pick when the position of the robot is equal to the ball, drop when the position of the robot is equal to the bucket, move left/right to make the position between them equal", thereby provide a proof of that ONA has a efficient procedual learning machanism (sensorimotor intelligence).

Problem

As the title says, although ONA can have high performance on this game by self-learning currently.

However, if we change the random seed of the whole reasoner, it might be seen that the high performance of ONA in this game is accidential:

  • If the reasoner not babble "precisely", the robot can't learn any effective knowledge to achieve the goal.
  • Although the robot achieve the goal by coincident, if the second goal satisfaction arrive later, the "right knowedge" represented by temporal implications will faded out and the reasoner will fall back into the "random babbling without decisions" status, like the accidential experience of success is never happened.

Pictures

The successful case on mysrand(666)

Screenshot_2024-10-10-16-24-08-371

Failing cases on mysrand(667) and mysrand(668)

Screenshot_2024-10-10-15-22-35-102_com termux-edit Screenshot_2024-10-10-14-36-40-144

ARCJ137442 avatar Oct 10 '24 08:10 ARCJ137442

I agree, robust learning is not achieved for this particular example. I also have a test script which runs it with different seeds to evaluate it, I can commit it soon.

Part of the problem is that by design of this experiment, reward can only obtained in the very rare case that the object at the right position is picked up and then dropped at the target location, which is a rare occasion with motor babbling and when it happens there are tons of other hypotheses to weed out.

The solution will be to take what we learned from NACE and add the corresponding curiosity model to ONA: https://github.com/patham9/NACE

Another immanent change: the numeric representation is the initial incomplete one that has been experimentally added. In the meanwhile there is a solid implementation of numeric spaces which allows the system to both condition on concrete values and to perform comparisons between numeric measurements. With this new numeric value handling learning also seems way more robust: http://91.203.212.130/AniNAL/demo_complex_continuous_verbal.html

patham9 avatar Oct 10 '24 09:10 patham9

@patham9 Okay, I'll study these references later.

ARCJ137442 avatar Oct 10 '24 09:10 ARCJ137442

Due to he new extended numerical term handling this demo is obsolete. A new demo with the new numeric terms that allow to reason on

  • absolute values
  • value similarity
  • relative value to other values can be found in v0.9.3 release message and https://github.com/patham9/AniNAL

patham9 avatar Jun 05 '25 15:06 patham9