rai_whoami package RFC
The rai_whoami package plays a crucial role in achieving effective embodied artificial intelligence. It facilitates the automatic conversion of a robot's documentation into text, which is then used to appropriately tune the language model to adopt a specific personality aligned with the robot on which it is deployed.
Traditionally, the embodiment system prompt encompassed information about the robot's identity and operational principles. In the proposed new version, both the identity and constitution components are structured as follows:
Identity:
- name: Robot's name
- version: Prompt version
- description: A brief description (e.g., "You are a...")
Constitution:
- ethical_principles: Guidelines on prohibited actions
- operational_constraints: Operational limits (e.g., maximum speed, range)
- safety_protocols: Procedures for emergencies (e.g., "In the event of X, initiate shutdown")
- interaction_rules: Communication guidelines (e.g., "Always provide clear feedback to user commands")
- learning_guidelines: Instructions for adapting to user preferences based on interaction history
Additionally, incorporating multimodal elements, such as images, into the system prompt can enhance the embodiment process by providing visual context to the language model. This approach leverages advancements in multimodal prompting, where models process and integrate information from various modalities, including text and images, to improve understanding and interaction.
I welcome your feedback and comments on this proposal.
@maciejmajek It looks very good 👍🏼
One comment from me is that I would rename this RFC to "rai_whoami package system prompt RFC". As I understand rai_whoami package supports also other features, that are not described in this RFC. For example the RAG system for documentation (if LLM needs more detailed information about the robot).
Edit: I was a bit misled by the title which is a bit broad as @boczekbartek pointed out, though I believe my comment is still valid; but please make clear what is the scope of the RFC.
Please also update the RFC with:
- A motivation for the change (e.g. what is not optimal about current package)
- An analysis or summary of SoTA, or at least some links that are useful for the decision.
- Variables and constraints we need to be wary of and how this RFC affects them, especially when they are trade-offs. Typically axes of analysis would include:
- Operational performance of the robot in embodied tasks.
- Ease of adjustments / reconfiguration. What would be changed more often, what would typically persist?
- Strictness of certain parts of the "knowledge" and how to handle this strictness - e.g. vector database.
- Approach to differentiate a descriptive product catalog from actually identity.
- How it affects tokens / size of the overall system prompt.
- Any embodiment (e.g. real-time or power use) constraints.
As for current content, I would propose a bit more of a structure:
Identity:
Operational:
Communication:
Ethical:
As @boczekbartek mentioned, other features of the current whoami also need to be covered. Specifically, robot capabilities, constraints and components need to be defined strictly.
Additionally, incorporating multimodal elements, such as images, into the system prompt can enhance the embodiment process by providing visual context to the language model. This approach leverages advancements in multimodal prompting, where models process and integrate information from various modalities, including text and images, to improve understanding and interaction.
@maciejmajek I would be very interested in this and embodied RAG approaches like this in general
I really need something like this at present, and plan to experiment with augmenting the existing system, any guidance on this would be much appreciated.
The paper you linked proposes a very interesting solution. I believe that a somewhat less advanced solution in RAI is at the intersection of rai_whoami and RAI's spatiotemporal memory system, which is currently under development (#453). I'm attaching a poster of the solution to provide you with some insight.
I think developing this method in RAI would be highly beneficial for the framework. If you'd like to join the effort, we can either schedule a Discord call to speed up the onboarding process or continue the discussion here or on Discord.