Avatar image is broken when reading threads from history
Describe the bug
The PR https://github.com/Chainlit/chainlit/pull/1326 introduces a bug when it expects the get_avatar to match string (https://github.com/Chainlit/chainlit/pull/1326/files#diff-9a73e67e163edc3fae6839bf2bcc8c6ab650624eff6668502388a7a86b7ece78R964) .
The Frontend app can make requests to the avatar that do not match the Regex (https://github.com/Chainlit/chainlit/blob/main/frontend/src/components/molecules/messages/components/Avatar.tsx#L31) this creates a bug for existing applications where the config.ui.name contains space (e.g. My Assistant).
To Reproduce
Set your config.ui.name to My Assistant
- Create a thread
- Publish messages
- Create a new thread
- Try to resume the previous thread
- See the 400 error and the avatar not showing
Expected behavior Avatar should be loaded for threads in history.
To fix the bug where the get_avatar function fails to load avatars correctly for threads in history when config.ui.name contains spaces, you need to ensure that the author parameter in the URL is properly encoded. This can be done using encodeURIComponent to handle spaces and other special characters.
Here is the modified code for the MessageAvatar component:
import { useContext, useMemo } from 'react';
import { Tooltip } from '@mui/material';
import Avatar from '@mui/material/Avatar';
import {
ChainlitContext,
useChatSession,
useConfig
} from '@chainlit/react-client';
interface Props {
author?: string;
hide?: boolean;
}
const MessageAvatar = ({ author, hide }: Props) => {
const apiClient = useContext(ChainlitContext);
const { chatProfile } = useChatSession();
const { config } = useConfig();
const selectedChatProfile = useMemo(() => {
return config?.chatProfiles.find((profile) => profile.name === chatProfile);
}, [config, chatProfile]);
const avatarUrl = useMemo(() => {
const isAssistant = !author || author === config?.ui.name;
if (isAssistant && selectedChatProfile?.icon) {
return selectedChatProfile.icon;
}
return apiClient?.buildEndpoint(`/avatars/${encodeURIComponent(author || 'default')}`);
}, [apiClient, selectedChatProfile, config, author]);
return (
<span className={`message-avatar`}>
<Tooltip title={author}>
<Avatar
sx={{
width: '1.6rem',
height: '1.6rem',
bgcolor: 'transparent'
}}
src={hide ? undefined : avatarUrl}
/>
</Tooltip>
</span>
);
};
export { MessageAvatar };
By using encodeURIComponent on the author parameter, you ensure that any spaces or special characters are correctly encoded, preventing the 400 error [1].
Additionally, the get_avatar function in the backend handles the avatar_id parameter by validating it, normalizing it, and searching for a matching avatar file. Here is the implementation:
@router.get("/avatars/{avatar_id:str}")
async def get_avatar(avatar_id: str):
"""Get the avatar for the user based on the avatar_id."""
if not re.match(r"^[a-zA-Z0-9_-]+$", avatar_id):
raise HTTPException(status_code=400, detail="Invalid avatar_id")
if avatar_id == "default":
avatar_id = config.ui.name
avatar_id = avatar_id.strip().lower().replace(" ", "_")
base_path = Path(APP_ROOT) / "public" / "avatars"
avatar_pattern = f"{avatar_id}.*"
matching_files = base_path.glob(avatar_pattern)
if avatar_path := next(matching_files, None):
if not is_path_inside(avatar_path, base_path):
raise HTTPException(status_code=400, detail="Invalid filename")
media_type, _ = mimetypes.guess_type(str(avatar_path))
return FileResponse(avatar_path, media_type=media_type)
return await get_favicon()
This function ensures that the avatar_id is properly formatted and validated before attempting to retrieve the avatar file [2].
To continue talking to Dosu, mention @dosu.
@LucasMagnum Love to hear whether the fix suggested by our bot does the trick. If it does, we'd love a PR for that, preferably including a(n update to a) E2E test demonstrating the issue and it's resolution.
@dosu Any suggestion as to which is the most applicable place to write such a (regression) test, preferably without adding a new E2E (because they're slow)? What could such a test look like?
The most applicable place to write a regression test for the get_avatar function without adding a new E2E test is in the backend/tests/test_server.py file. This file already contains tests for the get_avatar function, so adding a regression test here would be consistent with the existing structure.
Here is an example of what such a regression test could look like:
def test_get_avatar_regression(test_client: TestClient, monkeypatch: pytest.MonkeyPatch):
"""Regression test for get_avatar function."""
# Setup: Create a custom avatar
custom_avatar_path = os.path.join(
APP_ROOT, "public", "avatars", "regression_avatar.png"
)
os.makedirs(os.path.dirname(custom_avatar_path), exist_ok=True)
with open(custom_avatar_path, "wb") as f:
f.write(b"regression test image data")
# Test: Fetch the custom avatar
response = test_client.get("/avatars/regression_avatar")
assert response.status_code == 200
assert response.headers["content-type"].startswith("image/")
assert response.content == b"regression test image data"
# Clean up
os.remove(custom_avatar_path)
This test ensures that the get_avatar function correctly handles a custom avatar and returns the expected image data [1][2].
To continue talking to Dosu, mention @dosu.
The most applicable place to write a regression test for the
get_avatarfunction without adding a new E2E test is in thebackend/tests/test_server.pyfile. This file already contains tests for theget_avatarfunction, so adding a regression test here would be consistent with the existing structure.Here is an example of what such a regression test could look like:
def test_get_avatar_regression(test_client: TestClient, monkeypatch: pytest.MonkeyPatch): """Regression test for get_avatar function.""" # Setup: Create a custom avatar custom_avatar_path = os.path.join( APP_ROOT, "public", "avatars", "regression_avatar.png" ) os.makedirs(os.path.dirname(custom_avatar_path), exist_ok=True) with open(custom_avatar_path, "wb") as f: f.write(b"regression test image data") # Test: Fetch the custom avatar response = test_client.get("/avatars/regression_avatar") assert response.status_code == 200 assert response.headers["content-type"].startswith("image/") assert response.content == b"regression test image data" # Clean up os.remove(custom_avatar_path)This test ensures that the
get_avatarfunction correctly handles a custom avatar and returns the expected image data [1][2].To continue talking to Dosu, mention @dosu.
But our specific test case concerns a space in config.ui.name -- and the requirement of URL encoding in the frontend, right?
Hi @dokterbob, was there a reason for preventing avatar names from containing spaces with the regex as such? I had to monkey patch the get_avatar() function when updating to 1.200 from 1.1.404 to modify the regex to allow spaces.
Changing the regex to this:
if not re.match(r"^[a-zA-Z0-9_. -]+$", avatar_id):
raise HTTPException(status_code=400, detail="Invalid avatar_id")
Completely fixed the issue for me.
I've just been able to replicate the issue in our deployment as well. Full stack trace etc: https://synergy-os.sentry.io/share/issue/b23e6a9655a1421e9c3273e2d4a13342/
Hi @dokterbob, was there a reason for preventing avatar names from containing spaces with the regex as such? I had to monkey patch the
get_avatar()function when updating to 1.200 from 1.1.404 to modify the regex to allow spaces.Changing the regex to this:
if not re.match(r"^[a-zA-Z0-9_. -]+$", avatar_id): raise HTTPException(status_code=400, detail="Invalid avatar_id")Completely fixed the issue for me.
To ensure proper security it is essential to always sanitise any sort of user input. That is, any unreasonable/unexpected/possibly failing or erroneous input should be rejected outright.
Allowing spaces in URL filenames IMHO definitely counts amongst these, i.e. the encoding of various types of unicode spaces can expose attack vectors. It is of course unfortunate if it breaks user experience and we should take care to address it, but not at the expense of increasing security exposure.
Perhaps the solution to this could be to slugify file names? Ref: https://docs.djangoproject.com/en/5.1/ref/utils/#django.utils.text.slugify
In that case, having a config.ui.name of 'Funny Bananawould give an avatar offunny_banana.*` (limiting file extensions here would be another one...).
I've given this some thought, I think indeed it would be best to 'slugify' the URL on the client side (not on the server side). I'll need to analyse more carefully what avatar's specifically are supposed to do, happy for people thinking along towards a spec for a solution.
@dokterbob will the fix for this be able to make it into the next release?
@dokterbob Are you saying that avatars (Assistants and Steps) should not be able to have spaces? I sure hope not because nearly all of my Avatars have spaces in the name, as do most people who have provided screenshots.
This issue is blocking many from upgrading past 1.1.404.
https://github.com/Chainlit/chainlit/blob/main/frontend/src/components/molecules/messages/components/Avatar.tsx#L31
What I'm saying is a clean fix is to move filename sanitation (avatar_id.strip().lower().replace(" ", "_")) to the client side and enforce 'clean' URL's.
I realise we need to fix this sooner though, I will make a PR with @AidanShipperley's fix to allow spaces in avatars.
I'm not gonna allow ~~spaces~~dots here though. That's exactly the kinda stuff which makes the code more prone to bugs.
I'm looking forward (or will roll) a patch which will move the encoding of filenames to the client side.
Cleaner proposal, proper client-side sanitation. We want to server to be conservative. Going forward, let's aim for industry-standard security. 😻 🗝️
See #1420.
We'll either roll that one or #1418 into 1.3.0