unstract icon indicating copy to clipboard operation
unstract copied to clipboard

UN-2579 [FIX] Fixed MIME type validation to auto-detect file types instead of checking content-type header

Open muhammad-ali-e opened this issue 6 months ago • 2 comments

What

  • Fixed MIME type validation to auto-detect file types using Python Magic instead of relying solely on the Content-Type header sent by clients. When clients send
  • application/octet-stream as Content-Type, the system now reads the actual file content to detect the correct MIME type.
  • refactored code to reduce Cognitive Complexity

Why

Recent MIME type validation changes caused existing client integrations to fail when they explicitly set Content-Type as application/octet-stream. This resulted in files being skipped with error messages like "Skipping file sample.pdf due to Unsupported MIME type: Unsupported MIME type 'application/octet-stream'".

How

  • Added _detect_mime_type() method that uses Python Magic to detect MIME type from file content
  • Modified file processing to detect MIME type from the first chunk of file data instead of trusting the Content-Type header
  • Maintained backward compatibility by still logging the original Content-Type header for debugging
  • Ensured proper error handling and logging throughout the process

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No, this PR fixes a regression and restores backward compatibility. It only changes how MIME types are detected (from header to content analysis) which is more reliable and allows previously working client integrations to function again.

Database Migrations

None required.

Env Config

None required.

Relevant Docs

None required.

Related Issues or PRs

UN-2579 - Bug handling MIME type for API deployment in the backend with backward compatibility

Dependencies Versions

None changed.

Notes on Testing

  • Test with files uploaded using application/octet-stream Content-Type
  • Verify that actual file types (PDF, TXT, etc.) are correctly detected
  • Confirm that unsupported file types are still properly rejected
  • Test backward compatibility with existing client integrations

Screenshots

None applicable.

muhammad-ali-e avatar Jul 02 '25 07:07 muhammad-ali-e

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$

github-actions[bot] avatar Jul 02 '25 12:07 github-actions[bot]