rai rai_whoami package RFC

The rai_whoami package plays a crucial role in achieving effective embodied artificial intelligence. It facilitates the automatic conversion of a robot's documentation into text, which is then used to appropriately tune the language model to adopt a specific personality aligned with the robot on which it is deployed.

Traditionally, the embodiment system prompt encompassed information about the robot's identity and operational principles. In the proposed new version, both the identity and constitution components are structured as follows:

Identity:

name: Robot's name
version: Prompt version
description: A brief description (e.g., "You are a...")

Constitution:

ethical_principles: Guidelines on prohibited actions
operational_constraints: Operational limits (e.g., maximum speed, range)
safety_protocols: Procedures for emergencies (e.g., "In the event of X, initiate shutdown")
interaction_rules: Communication guidelines (e.g., "Always provide clear feedback to user commands")
learning_guidelines: Instructions for adapting to user preferences based on interaction history

Additionally, incorporating multimodal elements, such as images, into the system prompt can enhance the embodiment process by providing visual context to the language model. This approach leverages advancements in multimodal prompting, where models process and integrate information from various modalities, including text and images, to improve understanding and interaction.

I welcome your feedback and comments on this proposal.

Mar 27 '25 20:03 maciejmajek

@maciejmajek It looks very good 👍🏼

One comment from me is that I would rename this RFC to "rai_whoami package system prompt RFC". As I understand rai_whoami package supports also other features, that are not described in this RFC. For example the RAG system for documentation (if LLM needs more detailed information about the robot).

Mar 28 '25 09:03 boczekbartek

Edit: I was a bit misled by the title which is a bit broad as @boczekbartek pointed out, though I believe my comment is still valid; but please make clear what is the scope of the RFC.

Please also update the RFC with:

A motivation for the change (e.g. what is not optimal about current package)
An analysis or summary of SoTA, or at least some links that are useful for the decision.
Variables and constraints we need to be wary of and how this RFC affects them, especially when they are trade-offs. Typically axes of analysis would include:

Operational performance of the robot in embodied tasks.
Ease of adjustments / reconfiguration. What would be changed more often, what would typically persist?
Strictness of certain parts of the "knowledge" and how to handle this strictness - e.g. vector database.
Approach to differentiate a descriptive product catalog from actually identity.
How it affects tokens / size of the overall system prompt.
Any embodiment (e.g. real-time or power use) constraints.

As for current content, I would propose a bit more of a structure:

Identity:

Operational:

Communication:

Ethical:

As @boczekbartek mentioned, other features of the current whoami also need to be covered. Specifically, robot capabilities, constraints and components need to be defined strictly.

Mar 28 '25 10:03 adamdbrw

Additionally, incorporating multimodal elements, such as images, into the system prompt can enhance the embodiment process by providing visual context to the language model. This approach leverages advancements in multimodal prompting, where models process and integrate information from various modalities, including text and images, to improve understanding and interaction.

@maciejmajek I would be very interested in this and embodied RAG approaches like this in general

I really need something like this at present, and plan to experiment with augmenting the existing system, any guidance on this would be much appreciated.

May 08 '25 09:05 antoan

The paper you linked proposes a very interesting solution. I believe that a somewhat less advanced solution in RAI is at the intersection of rai_whoami and RAI's spatiotemporal memory system, which is currently under development (#453). I'm attaching a poster of the solution to provide you with some insight.

I think developing this method in RAI would be highly beneficial for the framework. If you'd like to join the effort, we can either schedule a Discord call to speed up the onboarding process or continue the discussion here or on Discord.

poster.pdf

May 08 '25 15:05 maciejmajek