templ icon indicating copy to clipboard operation
templ copied to clipboard

docs: Deploy htmx demo multi-regionally

Open vikstrous2 opened this issue 1 year ago • 14 comments

It's very cool to see a demo with htmx, but I was hoping that the demo linked to from the docs would be an accurate reflection of how fast templ is. However, I was disappointed to see almost 1s of latency per request. My browser thinks that the time is spent server side. Is there a reason it can't be made faster with the way it's deployed? Is there something misconfigured? Is it storing the counter in a weird way that eventually slowed it down when it got too high? image

vikstrous2 avatar Sep 17 '23 13:09 vikstrous2

That is weird .. I don't see this in my little experiment at https://github.com/st3fan/hello-templ .. all requests are just a few millis (on localhost).

Screenshot 2023-09-17 at 4 10 16 PM

st3fan avatar Sep 17 '23 20:09 st3fan

Hi @vikstrous2, the demo uses AWS CloudFront, S3, AWS Lamdba and DynamoDB, so it's sensitive to the performance of those components. The demo is deployed to London and uses CloudFront as a CDN to reduce latency to the site in some geographic regions.

It's this little line of the AWS CDK code that sets the regions that CloudFront uses: https://github.com/a-h/templ/blob/04c0ee86ce7d1fa6414fd34b2880d807f7baa001/examples/counter/cdk/stack.go#L100C29-L100C55

I set it to "Price Class 100" to be cheap (for me!). If you're outside the included geographic regions, then you're going to be seeing much higher network latency, see https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html for the locations - but it's basically US, Europe, and Israel for the price I chose.

Maybe I should increase the deployment to all AWS regions - I'll just need to check how expensive that would be. Maybe it won't actually cost anything.

From the UK, this is the performance I see.

Screenshot 2023-09-17 at 21 56 43

If you're interested in the rendering performance, there's a basic set of benchmarks in https://github.com/a-h/templ/tree/main/benchmarks - no benchmark equals real world usage, but it gives you a rough idea of just how fast the templ's rendering is compared to React or Go's built in HTML templating system.

Plotted on a chart, it looks like this.

templ_benchmark

a-h avatar Sep 17 '23 21:09 a-h

@vikstrous2 was this consistently the response time? It's also possible you were encountering a cold start on the lambda used for the deployment.

joerdav avatar Sep 18 '23 08:09 joerdav

I'm on the west coast of the US right now. I just tried it again from a different network connection and I'm still seeing 600ms. According to https://clients.amazonworkspaces.com/Health.html, my latency to ireland is only 134 ms, so this is very strange. There must be more going on here than network latency. Let me know if there's any more info I can provide.

vikstrous2 avatar Sep 18 '23 15:09 vikstrous2

This is the timing in Berlin

image

guido4000 avatar Sep 19 '23 22:09 guido4000

I'm seeing >500ms response times from the US West Coast as well. I'm planning to build out a similar demo with AWS X-Ray enabled to see where the latency is coming from (assuming I can reproduce it). I suspect it is my route to the CloudFront Edge from my location, but I will report back when I have the X-Ray results.

mousedownmike avatar Oct 10 '23 17:10 mousedownmike

Just to add some more data... here's the p99 latencies from the demo over the last 6 weeks. It shows that in 99% of cases, the Lambda function is executing in <144ms.

Screenshot 2023-10-10 at 21 42 48

The CloudWatch Integration Latency is much higher though. It's showing 99% of the time, it takes <800ms for the either the Lambda function (accessed via a Lambda function URL) or the S3 bucket contents to be returned. So, it's losing quite a lot of latency along the way there.

A lot of the latency seems to be caused by Lambda Function URLs themselves. The green line shows that it's taking a long time for the Lambda function URL to get the request to and from the Lambda function.

The averages are much lower than the p99, as you might expect.

Screenshot 2023-10-10 at 21 46 33

The HTMX demo is CDK. It should be pretty easy to add X-Ray tracing to it, I just didn't because it costs more. 😁

a-h avatar Oct 10 '23 20:10 a-h

@mousedownmike - the HTMX counter with DynamoDB costs basically nothing to run, since it's all Lambda functions etc. Would be cool if you spun it up for a comparison!

a-h avatar Oct 10 '23 21:10 a-h

I'm running an unmodified version of the CounterStack in my account in us-east-1 here: https://d211acsw4ya2sw.cloudfront.net

I'm now seeing most requests with <300ms response times... better, but probably just because of proximity to the origin. I'm suspicious of the Lambda Function URL. With that being HTTPS only, there's an additional encrypt/decrypt step between CloudFront and the Lambda Function URL origin.

I might see what it looks like with API Gateway in the mix.

mousedownmike avatar Oct 12 '23 03:10 mousedownmike

I don't think API Gateway will make it any faster.

Lambda Function URLs appear to be some sort of deliberately hobbled or cut down version of API Gateway. It even has the same behaviour of breaking basic auth by mangling headers.

I could run up a Fargate version, but that costs real money, due to the ALB, mostly.

a-h avatar Oct 12 '23 06:10 a-h

You could switch to a google app engine. I have a few apps there with negligable cost, I mostly stay within the free tier.

joerdav avatar Oct 12 '23 07:10 joerdav

So... I emailed some friends in the Serverless team at AWS. I explained the current architecture, and goals of performance being good, while not spending much money. 😁

I said that the performance problem seems to be:

  • High latency from the global CloudFront distribution endpoints to the regional Lambda Function URL
  • The Lambda Function URL latency itself (the Lambda function execution time is "OK", it appears to be the "API Gateway" bit that causes issues)

I suggested these options and asked if they had other ideas:

  • Deploy the Lambda function to multiple regions and use DynamoDB Global Tables for the DB.
    • Might solve the CloudFront to region latency issue.
    • It would add a fair amount of complexity to the stack.
    • Have to work out how to route to specific Lambda functions regionally.
  • Switch to Application Load Balancer triggered Lambda.
    • Quite expensive due to Load Balancer per-hour cost.
    • Still doesn't solve latency from CloudFront to region.
  • Use CloudFront to ALB and then Fargate.
  • Maximally expensive.
  • Still doesn't solve latency from CloudFront to region.
  • Use CloudFront to Fargate with an open port.
    • Would significantly reduce latency associated with Function URL.
    • Not reliable.
  • Use EC2.
    • Just for completeness.

They agreed that my idea on #1 was the best in terms of reducing latency while keeping costs low.

a-h avatar Oct 13 '23 21:10 a-h

Hello, I wrote a HTMX-based example using Cloudflare's workers, it might be faster than the cloudfront deployment? If people could try it and post some timings, that would be great.

It's available at https://templ.headblockhead.com and https://templ-counter.headblockhead.workers.dev.

Source at https://github.com/headblockhead/templ-cloudflare-workers

I'm getting around 140ms from England, but it should hopefully be more consistent globally than the AWS hosted one.

headblockhead avatar Apr 28 '24 16:04 headblockhead

Just triaging some older issues, and to make the scope clear I'll update:

It seems that the performance issues are based on proximity, the ask is to deploy the lambda to multiple regions and implement dynamo global tables.

joerdav avatar May 31 '24 10:05 joerdav