goose icon indicating copy to clipboard operation
goose copied to clipboard

Goose Desktop: render UI from tool response embedded resource

Open aharvard opened this issue 5 months ago • 10 comments

Motivations

The MCP client experience in Goose Desktop is constrained to a single-threaded chat design paradigm and lacks the capability to render rich, interactive UIs based on tool results for users.

The primary motivator behind this work is this MCP discussion, New Content Type for "UI" #287, opened by @kentcdodds.

We have two PRs (#2948, #3432) that attempt to add support for mcp-ui inside of Goose. We need to make some architectural decisions that can help close one and merge the other.

This work can unlock #3493 (a goose-native text diff viewer sidecar) plus many more enable future possibilities, such as:

  • Help designers explore photos and media from a digital asset manager directly in Goose
  • Visualize the contents of files (text, image, video, audio, 3D, code, etc) created by tools on the user’s system
  • Provide a preview environment for localhost dev (and potentially a “deploy” button alongside)
  • Provide text authoring tools for copywriters, content designers
  • Provide code editing for developers
  • Provide real-estate for full-featured applications such as Penpot (for designing) and tldraw/exaclidraw (for whiteboarding) — and maybe even one day provide an MCP host target for more sophisticated apps like google doc or jira products
  • Deliver agentic commerce: https://mcpstorefront.com/

Solution

A protocol for Desktop Goose that allows for native Goose tools and MCP server tools to render UI

// Example MCP tool using the typescript-sdk – https://github.com/modelcontextprotocol/typescript-sdk

import { createUIResource } from '@mcp-ui/server'
import { GooseMeta } from './types' 

// ... boilerplate code

server.tool('render_ui_inline', 'Display UI to the user whenever they ask for a ui demo', {}, async () => {
  const gooseMeta: GooseMeta = {
    toolUI: {
      displayType: 'inline',
      name: 'inline example for Goose Desktop',
      renderer: 'mcp-ui',
    },
  }
  return {
    _meta: {
      goose: gooseMeta,
    },
    content: [
      createUIResource({
        uri: 'ui://component-html-as-text',
        content: {
          type: 'rawHtml',
          htmlString: `<marquee>Hello World (from inline)</marquee>`,
        },
        encoding: 'text',
      }),
    ],
  }
})

// tool response
// {
//   "_meta": {
//     "goose": {
//       "toolUI": {
//         "displayType": "inline",
//         "name": "inline example for Goose Desktop",
//         "renderer": "mcp-ui"
//       }
//     }
//   },
//   "content": [
//     {
//       "type": "resource",
//       "resource": {
//         "uri": "ui://component-html-as-text",
//         "mimeType": "text/html",
//         "text": "<marquee>Hello World (from inline)</marquee>"
//       }
//     }
//   ]
// }

// Types for goose-specific _meta

type Renderer = 'mcp-ui' // day one

// could support additional renderers if not supported first-class way in mcp-ui, for example:
// type Renderer = 'mcp-ui' | 'goose-components' | 'goose-generative-ui' 

type SidecarToolUI = {
  displayType: 'sidecar'
  name: string
  renderer: Renderer
  trigger: {
    label: string // name to appear in tooltip
    icon: LucideIconName // expects a valid Lucide icon name
  }
  actionBar?: {
    actions: {
      label: string // name to appear in tooltip
      icon: LucideIconName // expects a valid Lucide icon name
      action: () => void // action to send to configured renderer when triggered
    }[]
  }
}

type InlineToolUI = {
  displayType: 'inline'
  name: string
  renderer: Renderer
}

export type GooseMeta = {
  toolUI: SidecarToolUI | InlineToolUI
  //... potentially other metadata for future Goose experiences
}

_meta property

Goose desktop configuration can be handled by an MCP server returning a _meta property as part of the tool response alongside the embedded resource in the content property.

The _meta property is a reserved field in MCP that allows clients and servers to attach metadata to their messages without interfering with the core protocol functionality. It's defined in the schema as an optional object that can contain arbitrary key-value pairs.

Display Types

Tools may specify how the UI should be presented to users:

  1. Inline: minimal, nested as part of the chat good for data viz
  2. Sidecar: maximal, surface area for rich interaction and app-like experiences

[!NOTE] A note on "sidecar" naming: there has been some discussion around if “sidecar” is better than “wingmate” (this comment by @mgd1984 and this comment by @liady). Folks at block, @spencrmartin, @Kvadratni , @acekyd, feel that we should reserve “wingmate” for top-level agentic experiences. For example, an always-on Goose agent that’s “flying with you” (analogy by Spencer Martin). For this reason, I think “sidecar” makes the most sense for agentic UI presentation in Goose.

Rendering Expectations for inline display type

  1. UI should not be hidden under a collapsed tool result UI (even if users have the ability to expand to view)
  2. UI must not be bound by scrollbars and should take up as much height as needed.
  3. Scrolling should be managed by the Goose message scroll area

Rendering Expectations for sidecar display type

  1. Sidecars are opened up by clicking the respective trigger button that appears in the chat thread
  2. Sidecars render on the right-side of the Goose app
  3. Sidecar container height must not to exceed height of goose window
  4. UI overflow of content inside of sidecar must be be managed inside sidecar scroll area
  5. Sidecar may present action buttons to users in consistent locations. Triggering an action button fires callback function from parent container, into the renderer sandbox

Renderer Types

Regardless if the UI is shown to users inline or in a sidecar, tools may specify options for which UI renderer to use. Goose should be flexible and allow for a variety of UI rendering techniques by embracing generative UI presentation while also supporting deterministic rendering.

For day one, Goose may only support mcp-ui as a renderer. In the future, Goose can support less deterministic rendering implementations.

MCP UI

  1. Use the MCP UI library to render the UI.
  2. When used, we expect the tool to return a UI resource that conforms to the MCP UI resource schema.

Goose components (speculative, needs more discovery work)

  1. TBD
  2. Use the Goose UI library to render the UI.
  3. When used, we expect the tool to return a UI resource that conforms to the Goose component registry schema.
  4. Goose may need to leverage sampling, if LLM tokens are to be spent

Goose generative UI (speculative, needs more discovery work)

  1. TBD
  2. Use the Goose generative UI engine to render the UI.
  3. When used, we expect the tool to return a UI resource that conforms to the Goose generative UI resource schema.
  4. Goose may need to leverage sampling, if LLM tokens are to be spent

[!NOTE] The Block team has various internal threads that may unlock Goose components and generative UI. It’s possible that mcp-ui may/may not support these. These should not be considered blockers to implement this issue. However, we should actively engage the community on how to best fit these in Goose, mcp-ui, and possibly even the MCP spec.

Agentic loop expectations

Users should be able to inerect with UIs in such a way that may trigger the agentic loop. For example:

  1. user highlights text in def view, diff view presents contextual menu, user clicks “explain this,” renderer sends action to client session, agentic loop spins up
  2. user clicks an image in a gallery UI, gallery UI presents contextual menu, user clicks, “more like this,” render sends action to client session, agentic loop spins up
  3. user clicks “publish” button from action bar in sidecar that’s previewing vibe-coded website, action sent to client session, agentic loop spins up

Considerations

I have considered divorcing mcp-ui integration from the Goose Desktop sidecar protocol. However, I think we can still start small while keeping the big picture in mind

  • [x] I have verified this does not duplicate an existing feature request

Phased implementation

To start small, I think we might be able to follow an approach that might look like the following:

Phase One

I~mplement inline protocol to unlock MCP servers' UI rendering via MCP-UI. We could achieve this by landing one of these PRs (#2948 or #3432).~ Done via https://github.com/block/goose/pull/2948

Phase Two

Implement sidecar protocol to unlock more real estate (still leveraging MCP-UI as the renderer). We could achieve this by working it into this PR #3493.

Phase Three

Enable non-deterministic UI rendering by delivering a renderer solution for goose components and/or goose generative ui (or team up with @idosal, @liady, and @tobinsouth on some mcp-ui or radix-based solution).

aharvard avatar Jul 21 '25 20:07 aharvard

Perfect. Thanks for pulling this together - will need some time to digest, but stoked to see a solid group of contributors chiming in. 🙏

As for Wingmate - I'm just glad it's being used somewhere...too good of a metaphor to pass up!

More to come...

mgd1984 avatar Jul 22 '25 02:07 mgd1984

Thanks for the excellent write-up @aharvard, and incredible work by you and @mgd1984! I love the direction Goose is going with its agentic UI vision.

The phased approach makes a lot of sense. It enables us to start fast and lean. As you mentioned, mcp-ui is already well-suited to serve as a foundation for the first phases. Treating it as a rendering engine aligns completely with the mission of a lean, modular library that can unlock any UI vision.

It'd be fascinating to learn how the community utilizes UI and where Goose takes it. Based on what we learn, mcp-ui can grow with Goose to serve future phases, whether it's something like the existing remote-dom content type (with a Goose/Radix component library), a more abstract declarative UI language, or even full-on generative UI (already in the roadmap - feedback will be highly appreciated). The cool thing about it is that it'll directly impact the directions and enhancements considered in the UI CWG!

Happy to help in any way!

idosal avatar Jul 22 '25 08:07 idosal

FYI — Goose is undergoing a focused effort to bring it up to speed with the MCP spec (#3578). Part of that work has been to move internals to use the official Rust SDK (RMCP). I discovered that the SDK only sends back resource.mime_type instead of resource.mimeType (I think)...

I've opened an issue here – https://github.com/modelcontextprotocol/rust-sdk/issues/338 Corresponding PR here = https://github.com/modelcontextprotocol/rust-sdk/pull/339

Until RMCP supports resource.mimeType I think we're blocked on integrating MCP-UI into Goose as part of phase one.

Phase One Implement inline protocol to unlock MCP servers' UI rendering via MCP-UI.

aharvard avatar Jul 29 '25 12:07 aharvard

Thanks @aharvard ! I see the PR was merged. Are there any other blockers?

idosal avatar Jul 29 '25 22:07 idosal

@idosal yup! merged and that got me unstuck. I just pushed some updates to https://github.com/block/goose/pull/2948 and some of our folks are gonna start to dig in on review. I'll be sure to keep you in the loop!

aharvard avatar Jul 30 '25 21:07 aharvard

Phase one merged! https://github.com/block/goose/pull/2948

Phase One Implement inline protocol to unlock MCP servers' UI rendering via MCP-UI.

aharvard avatar Aug 01 '25 12:08 aharvard

That's awesome! Thanks for all the hard work everyone. Excited to see where this takes us.

idosal avatar Aug 01 '25 14:08 idosal

This recording showcases basic MCP-UI features within Goose.

The weather card UI is delivered to Goose by an HTTP MCP server (https://mcp-aharvard.netlify.app/mcp) that I've set up as an extension. (source code: https://github.com/aharvard/mcp_aharvard/blob/main/netlify/mcp-server/index.ts)

Soon, in an upcoming release, any extension that returns a tool response with an embedded resource that conforms to the MCP-UI spec will be able to have its UI rendered inline!

Also, we're taking on a note in the UI that it's experimental to manage expectations.

https://github.com/user-attachments/assets/adbe4829-160b-4fa3-ae2d-de32f7b81f2a

aharvard avatar Aug 01 '25 16:08 aharvard

I built a MCP UI demo for Square which relies on Goose handling tool calls dispatched by MCP UI actions. This video was recorded with a pre-release build of https://github.com/block/goose/pull/4041.

https://github.com/block/goose/pull/4041 has some great feedback about architecture and security. Please feel free to contribute to the PR discussion if interested.

https://github.com/user-attachments/assets/1dc376ac-2bee-4aa1-aa08-78a24d4dfda5

aharvard avatar Aug 13 '25 13:08 aharvard

FYI, just opened this discussion based on community feedback https://github.com/block/goose/issues/4117

aharvard avatar Aug 15 '25 17:08 aharvard

I'm going to close this mega thread and would like to open tighter/more focused issue going forward.

aharvard avatar Nov 07 '25 15:11 aharvard