markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

Expose via Web API

Open lqdev opened this issue 1 year ago • 6 comments

Set up a Web API so users can use the library via a REST endpoint.

This is also useful for Docker scenarios as well.

lqdev avatar Dec 19 '24 02:12 lqdev

Feel free to assign this one to me.

lqdev avatar Dec 19 '24 02:12 lqdev

I started working on something like this but the moment I import from markitdown import MarkItDown I get the ugly warning:

python3.12/site-packages/starlette/routing.py:297: ResourceWarning: Unclosed file <tempfile.SpooledTemporaryFile object at 0x149021990>
  await self.app(scope, receive, send)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

This is without even actually instantiating a MarkItDown object let alone using it. Is that a bug in the package?

gautam-e avatar Dec 19 '24 20:12 gautam-e

Here's a web app I made that works with Markitdown. I use it for my personal workflow: www.docx2md.com

gautam-e avatar Dec 24 '24 11:12 gautam-e

Great to see the community work and samples around this!

@elbruno also put together a quick sample of how such an API might be consumed in a C# client application.

https://github.com/elbruno/MarkItDownServer

Once #202 is merged, it'd be great to publish the Dockerfile as an image to the Microsoft Container Registry so you don't have to pull down the entire repo to use the container / server.

luisquintanilla avatar Jan 07 '25 22:01 luisquintanilla

Any updates on this?

dezoito avatar Apr 10 '25 21:04 dezoito

If anyone reading this is still interested in the Web API idea, I forked @elbruno's work into a new project and made a few updates:

  • Pinned dependencies
  • Use multistage docker builds and uv for really quick builds and small images
  • Added a convenience script to rebuild the image/containers when running locally
  • Updated list of acceptable file extensions
  • Replaced .Net client code (used to test the API) with curl instructions in the README

https://github.com/dezoito/markitdown-api

I tried to give credit and proper attribution to elbruno, but if I am missing something ltk and I'll update the repo.

dezoito avatar Apr 14 '25 13:04 dezoito