micronaut-core icon indicating copy to clipboard operation
micronaut-core copied to clipboard

Add fast compiled route matcher

Open yawkat opened this issue 2 years ago • 19 comments

Note this is two commits that are mostly independent, but the second commit has the vast majority of the changes.


This PR adds an alternate request processing path that mostly bypasses RoutingInBoundHandler. The goal is to speed up processing of simple routes significantly.

The approach taken here is to accumulate all routes, prepare those that can be executed quickly (i.e. simple argument / return binding, exact path match, no difficult filters), and then compile the routing tree into an optimized MatchPlan. In principle this approach could be extended to all routes, we "just" need to adapt other framework features (e.g. path parameters) to be included in the compiled MatchPlan. This is my long-term vision for the future of routing in the Micronaut HTTP stack.

The results are very promising so far. FullHttpStackBenchmark goes from ~7µs to ~5µs, a big improvement. And there aren't even very many microoptimizations yet.

yawkat avatar Nov 17 '23 14:11 yawkat

Converted to draft because I think we will need to iterate a bit on this and @dstepanov and me are away next weeks. Will try do an initial review Monday

graemerocher avatar Nov 18 '23 08:11 graemerocher

If I understand correctly, we have to duplicate matching and method invocation logic to do this. The already complicated code gets more complex, and it's not worse IMHO. I feel it's going to be another option like HttpServerType, and is never going to be enabled. We should try to optimize RoutingInBoundHandler if it's that problematic. I see that we don't cache filters, which might be a good idea to add.

When we discussed this task, I assumed we wanted to optimize the Netty pipeline as it is the most visible in the profiling. (Note: I don't fully understand Netty) My idea was to rewrite this code to be more dynamic based on the route we resolve:

private void insertMicronautHandlers(boolean zeroCopySupported) {
            channel.attr(STREAM_PIPELINE_ATTRIBUTE.get()).set(this);
            if (sslHandler != null) {
                channel.attr(CERTIFICATE_SUPPLIER_ATTRIBUTE.get()).set(sslHandler.findPeerCert());
            }

            SmartHttpContentCompressor contentCompressor = new SmartHttpContentCompressor(embeddedServices.getHttpCompressionStrategy());
            if (zeroCopySupported) {
                channel.attr(PipeliningServerHandler.ZERO_COPY_PREDICATE.get()).set(contentCompressor);
            }
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_COMPRESSOR, contentCompressor);
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_DECOMPRESSOR, new HttpContentDecompressor());

            Optional<NettyServerWebSocketUpgradeHandler> webSocketUpgradeHandler = embeddedServices.getWebSocketUpgradeHandler(server);
            if (webSocketUpgradeHandler.isPresent()) {
                pipeline.addLast(NettyServerWebSocketUpgradeHandler.COMPRESSION_HANDLER, new WebSocketServerCompressionHandler());
            }
            if (server.getServerConfiguration().getServerType() != NettyHttpServerConfiguration.HttpServerType.STREAMED) {
                pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_AGGREGATOR,
                    new HttpObjectAggregator(
                        (int) server.getServerConfiguration().getMaxRequestSize(),
                        server.getServerConfiguration().isCloseOnExpectationFailed()
                    )
                );
            }
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_HTTP_CHUNK, new ChunkedWriteHandler()); // todo: move to PipeliningServerHandler

            RequestHandler requestHandler = routingInBoundHandler;
            if (webSocketUpgradeHandler.isPresent()) {
                webSocketUpgradeHandler.get().setNext(routingInBoundHandler);
                requestHandler = webSocketUpgradeHandler.get();
            }
            if (server.getServerConfiguration().isDualProtocol() && server.getServerConfiguration().isHttpToHttpsRedirect() && sslHandler == null) {
                requestHandler = new HttpToHttpsRedirectHandler(routingInBoundHandler.conversionService, server.getServerConfiguration(), sslConfiguration, hostResolver);
            }
            pipeline.addLast(ChannelPipelineCustomizer.HANDLER_MICRONAUT_INBOUND, new PipeliningServerHandler(requestHandler));
        }

We add those steps after we resolve the route. So the idea:

  1. Add a step as much in the front as possible where we can get the request
  2. Convert the request into our abstraction, probably as lazy as possible. I don't think a basic hello-world with disabled CORS should touch headers
  3. Find a route
  4. Based on the route, add pipeline steps. GET method should not add content decompression etc. Websocket added only for WS routes. I assume we can add the steps needed for the response (chunks, decompression) after we know what kind it is and if it does exist. Looking at HttpContentDecompressor and HttpContentCompressor, those can be skipped for some content types.

dstepanov avatar Nov 19 '23 09:11 dstepanov

I guess you can comment-out those steps and see if that is worse the change.

dstepanov avatar Nov 19 '23 09:11 dstepanov

@dstepanov

This PR addresses RoutingInBoundHandler, RequestLifecycle and so on. It is true that routing and execution logic is duplicated, but I do not see a way around this. The current routing API (not just the implementation) is sub-optimal, and the only solution I could see is to flip it around (accumulate the routes in a builder and use essentially a compiler to build them into a final optimized routing tree). Yes it is similar to HttpServerType, and people probably won't turn it on, but don't forget that HttpServerType also got us the PipeliningServerHandler which now gives the benefit of HttpServerType.FULL_CONTENT for most requests (those that don't exceed a certain body size) for everyone even without FULL_CONTENT enabled. Similarly, in my opinion this new approach can be extended (maybe with breaking some compat, so 5.x) to extend to most or all routes by default.


The netty pipeline is a separate issue. It is not as easy as you say to simply add the handlers dynamically. Many of the handlers need to see the full request lifecycle so cannot be added halfway into the request dynamically. Additionally, HTTP pipelining is a huge issue when it comes to modifying the handler pipeline. There can be many requests in flight at the same time in the same pipeline and it's hard to avoid bugs when modifying it dynamically.

However we already have compressed some parts of the pipeline with the introduction of PipeliningServerHandler. The job that was done before by I think three separate handlers is now done by one. In my opinion, this is the correct approach to remove the other handlers also: The chunked write handler and the compression handlers should go into PipeliningServerHandler where they can be loaded dynamically per request, and where HTTP pipelining is already handled properly. PipeliningServerHandler with its InboundHandler and OutboundHandler internal abstractions is already set up to do exactly this, without adding overhead for uncompressed requests, it's just not implemented yet. But such an optimization cannot be a substitute for improved routing.

yawkat avatar Nov 20 '23 07:11 yawkat

I think this is an interesting line of investigation and we should continue it, it could be that it becomes the default if we can make it pass the HTTP TCK.

I think compatibility is possible if we make certain things lazy (ServerRequestContext). I agree we should probably not have a configuration option for it as no one will enable it, and the goal should be full compatibility somehow.

Also I think we should investigate optimising route argument binding so that it can use the raw netty headers and provide an optimised API without all the BindingResult/Optional wrapping which will reduce allocations.

Looking at something like Vertx, they have separate routing for normal/templates routes for regex routes as well. So I think the only way to achieve that level of speed is to have static route lookups without the current route matching we have now.

graemerocher avatar Nov 20 '23 08:11 graemerocher

I think a config option is unavoidable for 4.x. Though they are a bit reduced from 3.x with the MessageBodyHandler introduction, we still have many cases where certain decisions cannot be done statically. For example, RouteExecutor does an body instanceof HttpStatus every time on controller return type instead of determining this from the route statically. So if you have a controller Object get() {return HttpStatus.NOT_FOUND;} we can't know statically that this needs special handling instead of going through the standard json body writer.

I want to get rid of these cases and make all response handling decisions statically (this example decision is made statically in this PR), but I don't think this is viable for 4.x without an explicit config option.

yawkat avatar Nov 20 '23 08:11 yawkat

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 12 Code Smells

75.7% 75.7% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

sonarqubecloud[bot] avatar Nov 20 '23 08:11 sonarqubecloud[bot]

You do know statically that it is a dynamic route since the return type is Object and therefore has to fallback to the slower original behaviour

graemerocher avatar Nov 20 '23 08:11 graemerocher

yea i guess we can be more conservative with Object return type and skip a lot of the logic.

yawkat avatar Nov 20 '23 09:11 yawkat

I will take a look in more detail next week, but I would prefer optimizing the current routing/invocation or reusing it instead of duplicating the implementation.

dstepanov avatar Nov 20 '23 10:11 dstepanov

Agree that if it is possible to optimise the existing routing that is preferable to duplicate routing

graemerocher avatar Nov 20 '23 11:11 graemerocher

fyi @dstepanov when i'm on vacation, feel free to look at the routing side but please leave the netty pipeline alone, i have changes already lined up for that.

yawkat avatar Nov 22 '23 14:11 yawkat

I'm working on optimizing filtering/routing. It looks like this change detected the overhead of it.

dstepanov avatar Nov 28 '23 10:11 dstepanov

i added filter support, it added maybe 100ns static overhead + 100ns per filter. cors filter is still skipped to avoid those 100ns.

yawkat avatar Dec 07 '23 14:12 yawkat

Can you extract the propagation improvements into a separate PR?

dstepanov avatar Dec 13 '23 08:12 dstepanov

sure

yawkat avatar Dec 13 '23 08:12 yawkat

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Feb 07 '24 21:02 CLAassistant

Is it dead improvement?

altro3 avatar Apr 11 '24 08:04 altro3

@altro3 we've decided that for now the marginal perf benefit does not outweigh the maintenance effort of duplicating the routing code paths. this decision may change in the future however

yawkat avatar Apr 12 '24 08:04 yawkat