zipkin-sparkstreaming
zipkin-sparkstreaming copied to clipboard
http stream factory
TL;DR; we could make a default http receiver. This reduces configuration to consumer only, which means elasticsearch, mysql or cassandra.
There's been a lot of discussion about how to make starting easier. Could people test their adjusters before production (or even ad-hoc) without needing to use Kafka?
I think the answer is http, as we've been here before. When zipkin first only supported Scribe, the debug story was pretty bad. We came up with http as the answer and that's been quite useful for folks.
details..
I looked at the docs, and this isn't a big deal. For example, the docs already show you how to make a test listener https://spark.apache.org/docs/latest/streaming-custom-receivers.html https://github.com/apache/spark/blob/v2.1.0/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java
I made a proof-of-concept, and found that it works fine. Basically, make a default stream factory like this..
@Configuration
@ConditionalOnMissingBean(StreamFactory.class)
@Import(HttpStreamFactory.class)
public class ZipkinHttpStreamFactoryAutoConfiguration {
}
Then, the HttpStreamFactory itself (using undertow as that has few deps so it won't bloat the jar)
public class HttpStreamFactory extends Receiver<byte[]> implements StreamFactory {
transient Undertow http;
HttpStreamFactory() {
super(StorageLevel.MEMORY_AND_DISK_2());
}
@Override public JavaDStream<byte[]> create(JavaStreamingContext jsc) {
return jsc.receiverStream(this);
}
@Override public void onStart() {
http = Undertow.builder()
.addHttpListener(9411, "127.0.0.1")
.setHandler(exchange -> {
exchange.getRequestReceiver().receiveFullBytes((ex, data) -> {
HttpStreamFactory.this.store(data);
ex.setStatusCode(202).endExchange();
},
(ex, exception) -> {
HttpStreamFactory.this.reportError(exception.getMessage(), exception);
ex.setStatusCode(500).endExchange();
});
}).build();
http.start();
}
@Override public void onStop() {
http.stop();
}
}
Of course this was a proof of concept, we'd parameterize the port etc.
If the goal is testing and debugging, why not make it even simpler? Read from a local file, and write to console or local file.
If the goal is testing and debugging, why not make it even simpler? Read from a local file, and write to console or local file.
So, this is "architecture simple" just like the google collector is a drop-in, requiring no explanation from users as it uses the exact same format as the normal server. You simply point your app you are debugging to that endpoint instead of making up a different process.
The current zipkin user doesn't use files at all. Instrumentation don't write to files for example, so files are an extra step. Also most cannot eye-ball a file to see how well it worked or didn't work. I even struggle with this even though I know "jq" pretty well. The next step would be to upload it into a server.
This doesn't preclude a file stream as I'm aware you like, just that if we think about the ecosystem, the simplest and most consistent thing we have, requiring the least explanation and easy integration for existing apps is http.
ps here's some scratches I had locally in case someone else wants to pickup between now and when I can divert time to this again https://github.com/openzipkin/zipkin-sparkstreaming/compare/http?expand=1