langserve icon indicating copy to clipboard operation
langserve copied to clipboard

Other deployment recommendations

Open sahanxdissanayake opened this issue 2 years ago • 5 comments
trafficstars

AWS and Vercel deployment recommendations would really help other dev(including me) to adopt this more rapidly

ex: https://vercel.com/templates/next.js/nextjs-fastapi-starter

sahanxdissanayake avatar Oct 14 '23 23:10 sahanxdissanayake

I'd also love to see AWS example!

austinmw avatar Oct 16 '23 23:10 austinmw

Can something that I get working locally with LangServe be deployed to Vercel?

apsquared avatar Oct 31 '23 17:10 apsquared

tl:dr: You can do it with Lambda and function URLs. See Streaming FastAPI with Lambda and Bedrock, that example shows how to create a simple web UI and use Anthropic claude-2 via Bedrock with FastAPI streaming in the middle.

To adapt that example to LangServe and build something useful, you'd package your chains and app/server.py and other modules you have that make up your microservice and deploy it with SAM/SLS/STT/Terraform. The linked example has a SAM template.

Options on AWS

For AWS, you're probably looking at EKS/ECS if you need to host a persistent vector DB, and Lambda Function URL with streaming plus Lambda Web Adapter coupled with your LangServe app. For hosted free options you got SupaBase pgVector, Neo4j Aura, Deep Lake (can host in S3, run in Lambda, or free 100 GiB cloud storage) that come to mind. Weviate has experimental embedded version, but I don't think you can decouple storage like with Deep Lake, which is otherwise great but not really that good for RAG. E.g., Weviate's BM25 + smart semantic search are far better than anything DL has to offer, even with their new Deep Memory.

So you can go fully serverless if you go with Lambda and hosted vector DB. And use ElastiCache Redis/Kinesis/MSK/SQS/SNS/etc. if you're building a real app. If you do the linked FastAPI example strategy, you can still get decent free-tier Redis and Kafka from Upstash.

I'm assuming async streaming is a requirement. Note that you can't use API GW, and ALB doesn't support HTTP/1.1 or 2 server-side event streaming with REST endpoints. It supports HTTP/2 + gRPC for streaming only. LangServe is based on Jina which supports gRPC and uses DocArray instead of Pydantic (it's Pydantic "compatible"), but threw all that away and opted for Pydantic V1, FastAPI, and REST. Funny thing is if you just install LangServe, you get Pydantic V2 which means you get no API docs.

So at this point you can evaluate your options, and go with Jina if that works for you. You can still use LangChain, and Jina's deployment options are far better. Dockerized microservices out of the box, k8s deployments, hosted k8s with currently free hosting up to some point. Jina ditched Pydantic for DocArray, so you'd be working with that to a degree.

So assuming you're sticking with LangServe, the good thing about containerizing your LangServe app is that you can leave it with Pydantic V1 and get docs, and the rest of your app can be V2, just be mindful what you pass LangChain as V1 and V2 are not compatible. And FastAPI doesn't support V1 and V2 simultaneously. It has a compat layer and works with either one but not both.

Since LangServe "consumes" the V1 slot, you can't use V2 with the same FastAPI. So the same rules apply to LangServe apps as they do to LangChan when it comes to working with Pydantic versions until this idiotic mess is sorted out in the far future, there are so many dependencies stuck on V1 that some can't move to V2, and there are those whose dependencies have moved to V2 or need V2 performance and features and upgrade.

And FastAPI and LangChain have to hack together solutions to work with either one installed. And it's mostly because V1 and V2 use the same package name, had they released pydantic2 things would be very different.

In any case, If you don't want Lambda, you're looking at NLB + L7 proxy + LangServe container(s). Note that API Gateway (any variant) and ELB/ALB don't d HTTP SSE streaming, which is what LangServe does. Hence the NLB and proxy to terminate TLS and reverse proxy to FastAPI.

That leaves you with EKS or ECS for hosting for the most part. If you've EKS then use that, if you're just deploying a LangChain app, ECS Fargate is easiest. If you're on EKS you probably know how to setup your gateway/ingress/Kong/Istio/whatever and handle TLS termination and certs.

If you're on ECS, you could front FastAPI + LangServe with a Traefik reverse proxy container for example. You'd need to set it up with a cert, preferably automated like Let's Encrypt setup. See this example for starters.

With that you've a crude streaming LangServe endpoint and you'll pay for NLB and ECS Fargate. Or blend it into you massive EKS bill. Calculate which is cheaper, OSS free vector DB on ECS or EC2 or a hosted one, factor in preferences and make a call.

Since you're now in AWS land, you've options for kv, blob stores, messaging, caching, backups, scaling, Bedrock, SageMaker serverless inference endpoint, X-Ray/OpenTelemetry/Jaeger/CloudWatch/Grafana/Prometheus/Loki/DataDog/LangSmith/W&B... for tracing and metrics, etc. so you could build a production app if you trust LangServe.

I'd wait a bit longer to see how it matures. Good for demos and prototyping, but I to run it in production I'd need a load and pentesting budget and hit it hard and fuzz it. I haven't seen any benchmarks or metrics or audits. The recent Playground bug is an example of what can go wrong.

Personally I'd DIY or go with Jina if I was responsible for production as of now. Once hosted LangServe is out of private beta it might be time to revisit the topic.

wfjt avatar Nov 16 '23 01:11 wfjt

this is absolutely gem of a comment, should convert this using GPT into a medium article @wfjt

DevBey avatar Jan 16 '24 09:01 DevBey

There are challenges with streaming, tried multiple deployment options.

  1. aws amplify + lambda => streaming does not work
  2. ec2 with docker nextjs and docker langserve streaming does not work.

in local it works on npm run dev but for some reason it does not in production. Cant figure out why.

akshaylingamaneni avatar Apr 04 '24 02:04 akshaylingamaneni