NEKO icon indicating copy to clipboard operation
NEKO copied to clipboard

review control prompting strategy

Open daniellawson9999 opened this issue 1 year ago • 0 comments

Gato's prompting for control tasks: prompting

We follow this general strategy, but fill the gaps as some details are not specified.

Let's look at some arguments related to prompting in our codebase for implementing this: https://github.com/ManifoldRG/gato-control/blob/2deb510246ebd6b13dd53199f8de7df4e0b96f34/train.py#L210-L212

args.prompt_ep_proportion aligns with this 25%. You can see this being used in sample_control_batch in trainer.py:

https://github.com/ManifoldRG/gato-control/blob/2deb510246ebd6b13dd53199f8de7df4e0b96f34/gato/training/trainer.py#L138-L144

We sample 25% of the batch to have prompting and then pick half of these indices for uniform prompting (uniform_indices ) and half of these indices for end prompting (end_indices).

What does args.unique_prompt_episodes do? In share_prompt_episodes. This is used in sample_batch_configurable in control_task.py, https://github.com/ManifoldRG/gato-control/blob/2deb510246ebd6b13dd53199f8de7df4e0b96f34/gato/tasks/control_task.py#L213-L218

This controls whether prompts come from the same episode (which we do by default), or if they come from other episodes in the same task. Rather than making this toggled, this could also be a proportion.

This issue is not to specifically fix a problem with this current implementation of prompting, but to raise some questions. It is currently uncertain if prompting for control tasks makes a substantial difference. I have not performed training runs without prompting, but obtain similar results if I disable prompting for initializing context during inference or start with an empty context (--promptless_eval). In the ideal world, with models of very long context length, we could prompt models to learn new control tasks without fine-fineunting. However, Gato discusses that they did not find this to work. Similarly, RoboCat, which is based off Gato, studies transfer learning as prompt-based adaption similarly does not yet work. If we rule out this succeeding as shown in our research, and intend for the model to be used for in-task evaluation or adapting to new domains through fine-tuning, then it may not be necessary to train with prompting, leading to some simplification.

One direction to determine if prompting is necessary is to run a comparison on our simple benchmarks (multi-task MuJoCo), evaluation diff in performance between training with and without, etc.

daniellawson9999 avatar Aug 13 '23 19:08 daniellawson9999