Expose via Web API
Set up a Web API so users can use the library via a REST endpoint.
This is also useful for Docker scenarios as well.
Feel free to assign this one to me.
I started working on something like this but the moment I import from markitdown import MarkItDown I get the ugly warning:
python3.12/site-packages/starlette/routing.py:297: ResourceWarning: Unclosed file <tempfile.SpooledTemporaryFile object at 0x149021990>
await self.app(scope, receive, send)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
This is without even actually instantiating a MarkItDown object let alone using it. Is that a bug in the package?
Here's a web app I made that works with Markitdown. I use it for my personal workflow: www.docx2md.com
Great to see the community work and samples around this!
@elbruno also put together a quick sample of how such an API might be consumed in a C# client application.
https://github.com/elbruno/MarkItDownServer
Once #202 is merged, it'd be great to publish the Dockerfile as an image to the Microsoft Container Registry so you don't have to pull down the entire repo to use the container / server.
Any updates on this?
If anyone reading this is still interested in the Web API idea, I forked @elbruno's work into a new project and made a few updates:
- Pinned dependencies
- Use multistage docker builds and
uvfor really quick builds and small images - Added a convenience script to rebuild the image/containers when running locally
- Updated list of acceptable file extensions
- Replaced .Net client code (used to test the API) with curl instructions in the README
https://github.com/dezoito/markitdown-api
I tried to give credit and proper attribution to elbruno, but if I am missing something ltk and I'll update the repo.