chainlit Avatar image is broken when reading threads from history

Describe the bug

The PR https://github.com/Chainlit/chainlit/pull/1326 introduces a bug when it expects the get_avatar to match string (https://github.com/Chainlit/chainlit/pull/1326/files#diff-9a73e67e163edc3fae6839bf2bcc8c6ab650624eff6668502388a7a86b7ece78R964) .

The Frontend app can make requests to the avatar that do not match the Regex (https://github.com/Chainlit/chainlit/blob/main/frontend/src/components/molecules/messages/components/Avatar.tsx#L31) this creates a bug for existing applications where the config.ui.name contains space (e.g. My Assistant).

To Reproduce Set your config.ui.name to My Assistant

Create a thread
Publish messages
Create a new thread
Try to resume the previous thread
See the 400 error and the avatar not showing

Expected behavior Avatar should be loaded for threads in history.

Sep 24 '24 10:09 LucasMagnum

To fix the bug where the get_avatar function fails to load avatars correctly for threads in history when config.ui.name contains spaces, you need to ensure that the author parameter in the URL is properly encoded. This can be done using encodeURIComponent to handle spaces and other special characters.

Here is the modified code for the MessageAvatar component:

import { useContext, useMemo } from 'react';

import { Tooltip } from '@mui/material';
import Avatar from '@mui/material/Avatar';

import {
  ChainlitContext,
  useChatSession,
  useConfig
} from '@chainlit/react-client';

interface Props {
  author?: string;
  hide?: boolean;
}

const MessageAvatar = ({ author, hide }: Props) => {
  const apiClient = useContext(ChainlitContext);
  const { chatProfile } = useChatSession();
  const { config } = useConfig();

  const selectedChatProfile = useMemo(() => {
    return config?.chatProfiles.find((profile) => profile.name === chatProfile);
  }, [config, chatProfile]);

  const avatarUrl = useMemo(() => {
    const isAssistant = !author || author === config?.ui.name;
    if (isAssistant && selectedChatProfile?.icon) {
      return selectedChatProfile.icon;
    }
    return apiClient?.buildEndpoint(`/avatars/${encodeURIComponent(author || 'default')}`);
  }, [apiClient, selectedChatProfile, config, author]);

  return (
    <span className={`message-avatar`}>
      <Tooltip title={author}>
        <Avatar
          sx={{
            width: '1.6rem',
            height: '1.6rem',
            bgcolor: 'transparent'
          }}
          src={hide ? undefined : avatarUrl}
        />
      </Tooltip>
    </span>
  );
};

export { MessageAvatar };

By using encodeURIComponent on the author parameter, you ensure that any spaces or special characters are correctly encoded, preventing the 400 error ^[1].

Additionally, the get_avatar function in the backend handles the avatar_id parameter by validating it, normalizing it, and searching for a matching avatar file. Here is the implementation:

@router.get("/avatars/{avatar_id:str}")
async def get_avatar(avatar_id: str):
    """Get the avatar for the user based on the avatar_id."""
    if not re.match(r"^[a-zA-Z0-9_-]+$", avatar_id):
        raise HTTPException(status_code=400, detail="Invalid avatar_id")

    if avatar_id == "default":
        avatar_id = config.ui.name

    avatar_id = avatar_id.strip().lower().replace(" ", "_")

    base_path = Path(APP_ROOT) / "public" / "avatars"
    avatar_pattern = f"{avatar_id}.*"

    matching_files = base_path.glob(avatar_pattern)

    if avatar_path := next(matching_files, None):
        if not is_path_inside(avatar_path, base_path):
            raise HTTPException(status_code=400, detail="Invalid filename")

        media_type, _ = mimetypes.guess_type(str(avatar_path))

        return FileResponse(avatar_path, media_type=media_type)

    return await get_favicon()

This function ensures that the avatar_id is properly formatted and validated before attempting to retrieve the avatar file ^[2].

To continue talking to Dosu, mention @dosu.

Sep 24 '24 10:09 dosubot[bot]

@LucasMagnum Love to hear whether the fix suggested by our bot does the trick. If it does, we'd love a PR for that, preferably including a(n update to a) E2E test demonstrating the issue and it's resolution.

@dosu Any suggestion as to which is the most applicable place to write such a (regression) test, preferably without adding a new E2E (because they're slow)? What could such a test look like?

Sep 25 '24 11:09 dokterbob

The most applicable place to write a regression test for the get_avatar function without adding a new E2E test is in the backend/tests/test_server.py file. This file already contains tests for the get_avatar function, so adding a regression test here would be consistent with the existing structure.

Here is an example of what such a regression test could look like:

def test_get_avatar_regression(test_client: TestClient, monkeypatch: pytest.MonkeyPatch):
    """Regression test for get_avatar function."""
    # Setup: Create a custom avatar
    custom_avatar_path = os.path.join(
        APP_ROOT, "public", "avatars", "regression_avatar.png"
    )
    os.makedirs(os.path.dirname(custom_avatar_path), exist_ok=True)
    with open(custom_avatar_path, "wb") as f:
        f.write(b"regression test image data")

    # Test: Fetch the custom avatar
    response = test_client.get("/avatars/regression_avatar")
    assert response.status_code == 200
    assert response.headers["content-type"].startswith("image/")
    assert response.content == b"regression test image data"

    # Clean up
    os.remove(custom_avatar_path)

This test ensures that the get_avatar function correctly handles a custom avatar and returns the expected image data ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Sep 25 '24 11:09 dosubot[bot]

The most applicable place to write a regression test for the get_avatar function without adding a new E2E test is in the backend/tests/test_server.py file. This file already contains tests for the get_avatar function, so adding a regression test here would be consistent with the existing structure.

Here is an example of what such a regression test could look like:
def test_get_avatar_regression(test_client: TestClient, monkeypatch: pytest.MonkeyPatch):
    """Regression test for get_avatar function."""
    # Setup: Create a custom avatar
    custom_avatar_path = os.path.join(
        APP_ROOT, "public", "avatars", "regression_avatar.png"
    )
    os.makedirs(os.path.dirname(custom_avatar_path), exist_ok=True)
    with open(custom_avatar_path, "wb") as f:
        f.write(b"regression test image data")

    # Test: Fetch the custom avatar
    response = test_client.get("/avatars/regression_avatar")
    assert response.status_code == 200
    assert response.headers["content-type"].startswith("image/")
    assert response.content == b"regression test image data"

    # Clean up
    os.remove(custom_avatar_path)
This test ensures that the get_avatar function correctly handles a custom avatar and returns the expected image data [1][2].

To continue talking to Dosu, mention @dosu.

But our specific test case concerns a space in config.ui.name -- and the requirement of URL encoding in the frontend, right?

Sep 25 '24 11:09 dokterbob

Hi @dokterbob, was there a reason for preventing avatar names from containing spaces with the regex as such? I had to monkey patch the get_avatar() function when updating to 1.200 from 1.1.404 to modify the regex to allow spaces.

Changing the regex to this:

if not re.match(r"^[a-zA-Z0-9_. -]+$", avatar_id):
    raise HTTPException(status_code=400, detail="Invalid avatar_id")

Completely fixed the issue for me.

Sep 25 '24 21:09 AidanShipperley

I've just been able to replicate the issue in our deployment as well. Full stack trace etc: https://synergy-os.sentry.io/share/issue/b23e6a9655a1421e9c3273e2d4a13342/

Hi @dokterbob, was there a reason for preventing avatar names from containing spaces with the regex as such? I had to monkey patch the get_avatar() function when updating to 1.200 from 1.1.404 to modify the regex to allow spaces.

Changing the regex to this:
if not re.match(r"^[a-zA-Z0-9_. -]+$", avatar_id):
    raise HTTPException(status_code=400, detail="Invalid avatar_id")
Completely fixed the issue for me.

To ensure proper security it is essential to always sanitise any sort of user input. That is, any unreasonable/unexpected/possibly failing or erroneous input should be rejected outright.

Allowing spaces in URL filenames IMHO definitely counts amongst these, i.e. the encoding of various types of unicode spaces can expose attack vectors. It is of course unfortunate if it breaks user experience and we should take care to address it, but not at the expense of increasing security exposure.

Perhaps the solution to this could be to slugify file names? Ref: https://docs.djangoproject.com/en/5.1/ref/utils/#django.utils.text.slugify

In that case, having a config.ui.name of 'Funny Bananawould give an avatar offunny_banana.*` (limiting file extensions here would be another one...).

Sep 30 '24 12:09 dokterbob

I've given this some thought, I think indeed it would be best to 'slugify' the URL on the client side (not on the server side). I'll need to analyse more carefully what avatar's specifically are supposed to do, happy for people thinking along towards a spec for a solution.

Oct 03 '24 14:10 dokterbob

@dokterbob will the fix for this be able to make it into the next release?

Oct 08 '24 12:10 hayescode

@dokterbob Are you saying that avatars (Assistants and Steps) should not be able to have spaces? I sure hope not because nearly all of my Avatars have spaces in the name, as do most people who have provided screenshots.

This issue is blocking many from upgrading past 1.1.404.

Oct 08 '24 23:10 hayescode

https://github.com/Chainlit/chainlit/blob/main/frontend/src/components/molecules/messages/components/Avatar.tsx#L31

What I'm saying is a clean fix is to move filename sanitation (avatar_id.strip().lower().replace(" ", "_")) to the client side and enforce 'clean' URL's.

I realise we need to fix this sooner though, I will make a PR with @AidanShipperley's fix to allow spaces in avatars.

Oct 09 '24 11:10 dokterbob

I'm not gonna allow ~~spaces~~dots here though. That's exactly the kinda stuff which makes the code more prone to bugs.

I'm looking forward (or will roll) a patch which will move the encoding of filenames to the client side.

Oct 09 '24 11:10 dokterbob

Cleaner proposal, proper client-side sanitation. We want to server to be conservative. Going forward, let's aim for industry-standard security. 😻 🗝️

See #1420.

We'll either roll that one or #1418 into 1.3.0

Oct 09 '24 18:10 dokterbob