imitation Add support for a simple preference UI

Description

See #711

Testing

TODO: add notebook and experiment config that use this feature, and screenshots of behavior. (I've tested myself but not in a clean way.)

May 11 '23 00:05 timbauman

FWIW I haven't cleaned this up at all so it's not quite ready for review - I probably should have included "[WIP]" on the title

May 11 '23 16:05 timbauman

I did a general code review, of the library changes, left a few comments. Thanks!

A few more notes:

Right after running the tutorial, this is what I see:

It's probably because we're still running on `gym==0.21.0` which uses pyglet for rendering, which is a ton of pain. Gymnasium uses pygame, so it might be easier to make headless, but I'm not 100% sure right now. If we could remove the individual rendering for the learning envs, that would be great. The one way I see to do this is an env without rendering for learning, and then a separate one for rendering -- would this be easy to do? I think if it works, it'd be a good improvement to the user experience of the tutorial.

Oh wow that is ugly! I don't think I saw this when I ran it, may be a version or OS difference.

I'm confused why the learning envs are rendering -- with Pendulum I thought the policy and reward took input in raw observations, and rendering was just used for the human preference comparisons?

Making it all headless obviously good if we're capturing the video anyway and showing it in the notebook. I'd be OK merging this PR without fixing this and just making an issue to track it -- it still seems better to have the tutorial than not.

Maybe an option to skip a given sample? (in the 1-2-r-q choice, add s for skip) Sometimes it's so hard to tell that it feels better to not choose anything; not sure how well this plays with the algorithm (it should be alright to just get another sample, or remove that one from the dataset)

If possible, maybe put the videos side by side

Yeah, I think the original DRLHP paper had an option for contractors to skip which just ignored the sample. Not that principled but works OK in practice.

This is in order of importance, and all of them are mainly just QoL improvements, depending on how much we want to spend on this tutorial. On the other hand, if we (or someone else) intend to use the UI for larger real workloads, they become somewhat more significant.

I think this is too toy to be used for real human experiments, #716 will be better for real workloads.

Jul 04 '23 03:07 AdamGleave

Oh wow that is ugly! I don't think I saw this when I ran it, may be a version or OS difference.

The windows can be avoided by using a virtual frame buffer. xvfb and it's python wrapper are useful for this.

Aug 11 '23 12:08 timokau