overcooked_ai
overcooked_ai copied to clipboard
Tutorial Google Colab Notebook
Creating a tutorial Google Colab notebook on how to use the environment, visualize rollouts, etc. Most useful after python visualizations #45 are completed.
There is an upcoming ipython notebook on trajectory visualization branch that can be used as a good starting point to this task. What can/should be done with this notebook:
- Add native python state visualizations along with an explanation of how to use it.
- Check if code/explanation is up to date with the current state of the library and fix what needs to be changed.
- Split notebook to series of short notebooks covering specific topics more in-depth (to e.g. explaining most of the config that can be used for state visualization/trajectory chart)? Is it a good idea?
- There is anything else to add here?
This seems good to me! I think we could even just have one notebook (we should put it in Colab), and have a table of contents.
Some non-obvious things that come to mind that the tutorial should address:
- How to create environments with different types of soups/reward/order schemes
- How to create and roll out a 1-agent environment
We probably later also want to have a tutorial for human_aware_rl repo which shows how to train, load, and evaluate simple agents.
Thanks for the suggestions! I will create a notebook with tutorial for current master branch code as my next task including those suggested things. I'm not familiar with human_aware_rl yet so I will leave it for now.
I've done a simple introduction notebook that works with the current master branch. https://colab.research.google.com/github/bmielnicki/overcooked_ai/blob/introduction_notebook/introduction.ipynb
Some things worth to mention:
- evaluate_human_model_pair method consistently produces 0 rewards, it is even worse than random agents so I did not show it at all. This means most of the reward printed out is 0 as I use only random agents in this tutorial.
- Think of it not as of finished tutorial, but rather starting point to quickstart guide. I'm not sure if I can do a proper tutorial that fully explains the whole high-level API.
- Some things are not explained/explained very briefly in the tutorial despite being used.
- I intentionally did not add any code from pull requests (mostly trajectory visualization chart and python native visualizations) to not make this notebook dependent on merging any code.
- If the notebook in its current state is good enough I can make a pull request.
Questions:
- Is it better to install overcooked_ai package inside the notebook using GitHub repo (newest master branch) or PiPy repo (newest released stable version)?
- Should I add relevant imports to every section despite they were used before? This way there would be no need to run every cell.
- Currently, it is kind of a mix of two - I've added every needed import to start of the file and then importing again as the mentioned imports did not take a place (to show the user where they come from as they are used)
- What do you think about it? Do you see the ways it can be improved?
Thanks Bartek, this is looking good!
I do think that it's worth to first merge the native python visualizations so we can add a little bit of color here too! (I'll check on the native viz PR just after this)
Regarding your questions:
- I think it's fine if we don't include human model pair evaluations.
- I think that unless we manage to somehow to add to our tests that all cells can run without errors, we should probably just install with the
pip
release, and manually check this guide is still working at every new release (that fortunately is quite infrequent). However, given that the native python viz will not be immediately in the pip release, we might at first just install from master, and make the switch to pip install at our next release! - Import wise -> up to you, I think adding imports in every section might be a nice touch, you're right 👍
Thanks for the reply!
- Month ago or so (probably before the big changes in planners code) human model gave better results that 0 rewards. I'm not sure what happened here and if its sign of some minor error. Showing the agent that plays well would enhance the experience but is not necessary.
- We can convert notebook to .py file using nbconvert and run it. It probably won't detect if something is wrong with the code output, just if it raises an error (so it will capture most of the obvious errors, and more subtle ones can be also not detected by human anyway). My concern is rather for merging code that would break the tutorial without raising an error. Pip releases can be more tested so its less of the concern here.
- I will leave it in the current state then.
Yeah I think this might be due to the mdp dynamics having changed. I thought @mesutyang97 might have fixed the planner behind the greedy human model, so I think it should be a relatively quick fix if you want to investigate the cause yourself (Bartek). If this seems to complicated, no worries!
Hi Bartek,
Thanks for the work of putting this iPython Notebook together. Wow!
I suspect it is because the BC agent is not pressing "cook" in the updated dynamics (because in the old dynamics, soup will automatically cook when it has 3 items)
I will look more into this. But thanks a lot for the work!
You are right - agents are not pressing cook. I'm already close to pushing the fix for this bug,
Could you let me know how you are planning to fix this? I had a discussion with Micah yesterday about this but couldn't come to a good conclusion. Would be nice to get your opinion
Given that GreedyHumanModel accepts only situation when all_orders has len of 1 we can just cook any soup of selected len.
inside ml_action I've changed this
if soup_nearly_ready and not other_has_dish:
motion_goals = am.pickup_dish_actions(counter_objects)
else:
assert len(state.all_orders) == 1, \
"The current mid level action manager only support 3-onion-soup order, but got orders" \
+ str(state.all_orders)
next_order = list(state.all_orders)[0]
if 'onion' in next_order:
motion_goals = am.pickup_onion_actions(counter_objects)
elif 'tomato' in next_order:
motion_goals = am.pickup_tomato_actions(counter_objects)
else:
motion_goals = am.pickup_onion_actions(counter_objects) + am.pickup_tomato_actions(counter_objects)
to this
if soup_nearly_ready and not other_has_dish:
motion_goals = am.pickup_dish_actions(counter_objects)
else:
assert len(state.all_orders) == 1, \
"The current mid level action manager only support 3-onion-soup order, but got orders" \
+ str(state.all_orders)
next_order = list(state.all_orders)[0]
soups_ready_to_cook_key = '{}_items'.format(len(next_order.ingredients))
soups_ready_to_cook = pot_states_dict[soups_ready_to_cook_key]
if soups_ready_to_cook:
only_pot_states_ready_to_cook = defaultdict(list)
only_pot_states_ready_to_cook[soups_ready_to_cook_key] = soups_ready_to_cook
# we want to cook only soups that has same len as order
motion_goals = am.start_cooking_actions(only_pot_states_ready_to_cook)
elif 'onion' in next_order:
motion_goals = am.pickup_onion_actions(counter_objects)
elif 'tomato' in next_order:
motion_goals = am.pickup_tomato_actions(counter_objects)
else:
motion_goals = am.pickup_onion_actions(counter_objects) + am.pickup_tomato_actions(counter_objects)
Additional improvement would be to cook any soup of size bigger than next_order too, or soups with ingredients that does not match next order (e.g soups that has a tomato when only order is ["onion", "onion", "onion"]
to free up the pots.
BTW state featurization tests (test_state_featurization and test_lossless_state_featurization_shape) failed after this change.
That looks like a good fix, thanks Bartek! I think we can keep it for ["onion", "onion", "onion"]
for now unless Nathan would like to use it (which, I believe he is currently not).
That is unfortunate. Could you share how it is failing?
Features were not equal self.assertTrue(np.array_equal(expected_featurization, featurized_observations))
. I've overwritten pickled features (by uncommenting line with save pickle), result is pushed to https://github.com/bmielnicki/overcooked_ai/tree/greedy_human_model_fix
@mesutyang97 Do you think it is okay to merge this code? I'm not sure how much those failed featurization tests are a concern (I'm not familiar with featurization at all so it is hard to judge for me how my code changed featurization and if it's bug or feature) - it passed all tests including featurization after overwriting the pickled files.
i think it should be good. Could you double check the updated pickled files contain the correct information?
self.rnd_agent_pair
in tests is the greedy human model and it produced a different trajectory than before (because there were cooking any soups). Because of that featurization changed. This explain how featurization tests failed without any change in featurization code. Here is the PR: https://github.com/HumanCompatibleAI/overcooked_ai/pull/64
EDIT: I have not double-checked updated pickled files, but if they were right before I assume they are right now (as nothing from the featurization code changed and there is a simple explanation why tests failed before)
Minor point: @mesutyang97 I think you were confusing BC agents and GreedyHumanModels: what Bartek was fixing was GreedyHumanModels (hardcoded agents that use planner logic), rather than BC agents.
I'll be checking out the PR itself soon.
Updated version with python visualizations: https://colab.research.google.com/drive/1xFNnun0Ykob71cP6oEi8Y9JKQUYXYChD?usp=sharing#scrollTo=s8fYj-bF2Z3R
Some notes:
- There is no need to rush with a review of this as https://github.com/HumanCompatibleAI/overcooked_ai/pull/53 should be merged first anyway. Mow the code uses branch from my fork, but it probably should not when published and added to the master branch.
- Trajectory sliders work only when the notebook is run by user - if you just go into the link and not run the code trajectory slider won't work.
- For some reason if you go into the link and click
Runtime
->Run all
it won't install overcooked_ai. UseRuntime
->Restart and run all
.
I've made a v0 in the README which is currently working (not sure whether the colab notebook above is stale): https://colab.research.google.com/drive/1AAVP2P-QQhbx6WTOnIG54NXLXFbO7y6n#scrollTo=Z1RBlqADnTDw.
To actually address this issue, we should merge the notebook above with my one linked here.
We should also figure out what is an easy way to maintain / test these kinds of notebooks. Primarily, we would want:
- CI to be able to catch the notebook not working anymore
- People to be able to submit PRs on the notebook code
This issue is stale because it has been open for 30 days with no activity. It will be closed in 14 days unless there is some new activity
This issue is stale because it has been open for 30 days with no activity. It will be closed in 14 days unless there is some new activity