dify icon indicating copy to clipboard operation
dify copied to clipboard

Issue: mimetypes Module Fails to Detect MIME Types in Minimal Docker Images

Open yjc980121 opened this issue 2 weeks ago • 5 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

The mimetypes.guess_type() function in Python is used to guess the MIME type of a file based on its extension. However, when running in minimal Docker images such as python:3.12-alpine or python:3.12-slim, the function fails to accurately detect MIME types due to the absence of the /etc/mime.types file. This file is essential for the mimetypes module to provide accurate results. Reproduction Steps Run the following Python code in a minimal Docker image (e.g., python:3.12-alpine or python:3.12-slim):

import os
import mimetypes

def guesstype(filename):
    print(f'filename: {filename}')
    mime_type = mimetypes.guess_type(filename)[0] or ""
    print(f'mime_type: {mime_type}')

guesstype("eb7215b51eae490b84b6f7f0646df2c4.docx")

# Check known files
for file in mimetypes.knownfiles:
    print(f'file: {file} {os.path.exists(file)}')

Observe the output. The MIME type detection will be incorrect, and the /etc/mime.types file will not exist.

✔️ Expected Behavior

The mimetypes.guess_type() function should accurately detect the MIME type of a file based on its extension, even in minimal Docker images. Actual Behavior In minimal Docker images, the /etc/mime.types file is missing, leading to incorrect MIME type detection. The function returns an empty string or an incorrect MIME type.

Possible Solutions 1.Use a Non-Minimal Docker Image Switch to a non-minimal Docker image such as python:3.12, which includes the necessary /etc/mime.types file. This approach ensures accurate MIME type detection without additional configuration.

2.Map the /etc/mime.types File If a minimal Docker image is preferred, the /etc/mime.types file from the host can be mapped to the container. This method does not require modifying the Docker image. Prerequisites: Ensure that the /etc/mime.types file exists on the host. If it does not, it can be downloaded from a reliable source such as the Apache HTTP Server repository.

  1. Use the python-magic Library as a Fallback As an alternative, the python-magic library can be used. This library does not rely on the /etc/mime.types file and instead uses file signatures to determine MIME types.

  2. Copy and Initialize a Custom mime.types File For those who need to use a minimal Docker image and avoid mapping files, a custom mime.types file can be copied into the image and initialized using mimetypes.init().

Related Issues #13285 #13146

❌ Actual Behavior

No response

yjc980121 avatar Feb 14 '25 03:02 yjc980121