chainlit icon indicating copy to clipboard operation
chainlit copied to clipboard

File Upload not recognizing mime

Open hayescode opened this issue 1 year ago • 2 comments

Describe the bug When uploading certain file types the mime types are not recognized.

Incorrect:

  • Markdown .md
    • ERROR: defaults to application/octect-stream
  • Python .py
    • ERROR: App crashes
  File "C:\Users\haze\CheersGPT\.venv\Lib\site-packages\engineio\base_server.py", line 229, in _get_socket
    raise KeyError('Session is disconnected')
KeyError: 'Session is disconnected'

Correct:

  • .csv, .xlsx, docx, .pdf, .pptx

config.toml

[features.spontaneous_file_upload]
    enabled = true
    accept = ["*/*"]
    max_files = 20
    max_size_mb = 500

app.py

@cl.on_message
async def message(message_from_ui: cl.Message):
    for file in message_from_ui.elements:
        for key, value in file.to_dict().items():
            print(f"{key}: {value}")

terminal (Markdown error)

id: ca004f97-1820-4eb2-a47a-f10c66351fbd
threadId: 09c34591-df77-4ede-aab1-4ab016503365
type: file
url: None
chainlitKey: ca004f97-1820-4eb2-a47a-f10c66351fbd
name: README.md
display: inline
forId: be404f8a-f1f1-4eb5-99d0-7dc9f7ca5176
mime: application/octet-stream

To Reproduce Steps to reproduce the behavior:

Markdown:

  1. Copy the README.md to your local computer.
  2. Configure Chainlit with above settings.
  3. Start app.py
  4. Add local README.md to chat and ask "Summarize This"

Python:

  1. Copy the user.py to your local computer
  2. Configure Chainlit with above settings.
  3. Start app.py
  4. Add local user.py to chat and ask "Summarize This"

Expected behavior Mime type is correct and app doesn't crash. Maybe if mime.guess returns the default application/octet-stream it can then infer based on the file extension in some cases?

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser Microsoft Edge
  • Version Chainlit 1.1.0rc1

Additional context Add any other context about the problem here.

hayescode avatar May 08 '24 14:05 hayescode

Looks like backend/chainlit/element.py has the code that handles detection of content type.

It would be nice to have access to the correct content type, i.e., application/pdf instead of pdf.

When uploading a JSON file I would expect application/json but I get back file.

aharth avatar Feb 23 '25 00:02 aharth

Hello i can confirm the bug on my side when i upload a PDF file the content type store in Blob azure storage say only "pdf" instead of "application/pdf" .. the result is during RESUME of a chat chainlit retrieve all data and because mime type is wrong the PDF is not visible in the chat. if i change manually the content type (on azure portal) to application/pdf i can resume and see PDF preview with no issue...

UPDATE/EDIT : with version 2.5.5 i don't have anymore the issue on Blob storage side. i can upload a PDF and see the content type correctly set to application/pdf ...

sadly the issue is still here with another side effect when i upload a PDF file the metadata in DATABASE (table elements) is not correct :

  • if i add a PDF to display inline with the example code : https://docs.chainlit.io/api-reference/elements/pdf --> The Element record in DB have metadata like that : {"page": null, "size": null, "type": "pdf", "display": "inline", "language": null}
  • if i upload a PDF file with spontanious upload feature i can check on DB and see metadata like this : {"page": null, "size": null, "type": "file", "display": "inline", "language": null} --> I CAN SEE PDF ICON AND A LINK TO DOWNLOAD IT :-(

And of course if i change in DB the type to "pdf" in metadata i can see the PDF inline

remyBerrebi-fi avatar Apr 24 '25 13:04 remyBerrebi-fi