vector feat(new source): add initial `websocket` source

Hi, and thanks for this nice software!

At koyeb we are interested in using vector to allow our users to forward the logs of their applications towards external destinations. Our API for receiving logs is exposed as a websocket source. Our plan is to add support for a generic websocket source, and then, in a following PR, adding support for a custom koyeb source in order to make easier for our users to configure vector to use with our API, in the same spirit as the heroku_logs source. Do you think this is an acceptable plan?

I noticed that vector has an open issue (#6491) to track the addition of a websocket source, and some preliminary (although rough) work was done in this closed PR. I did not start from there, as the fork from which it started was quite old. I tried instead to look at other existing sources and draw from there. That said, rust is not my primary language and I'd really like some guidance, in order to improve the code currently submitted. I know that something missing are the tests (unit and integration). Can you point me to some code that you deem a good example for these? What else is missing?

Let me know your thoughts. And thanks again!

Jul 04 '23 17:07 torrefatto

All committers have signed the CLA.

Jul 04 '23 17:07 bits-bot

Deploy Preview for vrl-playground canceled.

Name	Link
Latest commit	cfbd1cbd25730c99c74939d50e451001b9073520
Latest deploy log	https://app.netlify.com/sites/vrl-playground/deploys/64a45a82b037cd0008f266f1

Jul 04 '23 17:07 netlify[bot]

Deploy Preview for vector-project canceled.

Name	Link
Latest commit	cfbd1cbd25730c99c74939d50e451001b9073520
Latest deploy log	https://app.netlify.com/sites/vector-project/deploys/64a45a82bb34e80008db22be

Jul 04 '23 17:07 netlify[bot]

Hi @torrefatto, thanks for your proposed new integration!

Just an FYI, per https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md#new-sources-sinks-and-transforms , we will begin with proceduing through that checklist prior to reviewing the code. No need to write up answers to the checklist questions at this stage, I will inquire about anything in this PR's comment thread.

Our plan is to add support for a generic websocket source, and then, in a following PR, adding support for a custom koyeb source in order to make easier for our users to configure vector to use with our API, in the same spirit as the heroku_logs source. Do you think this is an acceptable plan?

Where possible, we like to avoid adding integrations that are specific to services (though obviously there have been cases of that historically). I would ask- what elements particularly in the source configuration for the koyeb source, would be unique? What extension would it provide over just using the websocket source?

Relatedly, we hopefully someday soon will have a plugin system where by some integrations could by maintained outside of the main Vector repo. That could be one potentiality for a future koyeb source.

That said, rust is not my primary language

Nice work!

I know that something missing are the tests (unit and integration). Can you point me to some code that you deem a good example for these? What else is missing?

I will help out with this post completion of the aforementioned checklist.

Regarding our checklist-

(Just considering the current PR for websocket source) Would you/your company be willing to commit to supporting this integration after it has been included?

We may reach out with follow-up queries in the coming days.

Jul 05 '23 21:07 neuronull

Hi @neuronull!

Just an FYI, per https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md#new-sources-sinks-and-transforms , we will begin with proceduing through that checklist prior to reviewing the code. No need to write up answers to the checklist questions at this stage, I will inquire about anything in this PR's comment thread.

Sorry! I totally overlooked that one.

Our plan is to add support for a generic websocket source, and then, in a following PR, adding support for a custom koyeb source in order to make easier for our users to configure vector to use with our API, in the same spirit as the heroku_logs source. Do you think this is an acceptable plan?

Where possible, we like to avoid adding integrations that are specific to services (though obviously there have been cases of that historically). I would ask- what elements particularly in the source configuration for the koyeb source, would be unique? What extension would it provide over just using the websocket source?

The protocol is the same (wss:). The fact is that we would like to conflate this with a functionality to retrieve the parameters needed to properly tail the logs, with this component. We index our log sources with an opaque id, and the API requires to specify such id, but we have another API to retrieve that id using an intelligible name. This would address the concern in the check-list you linked above

If the integration can be served with a workaround or more generic component, how painful is this for users?

We think it cloud be pretty painful. Of course, we would provide an adequate free-of-charge robot account to perform integration tests, if the second koyeb source gets accepted.

Relatedly, we hopefully someday soon will have a plugin system where by some integrations could by maintained outside of the main Vector repo. That could be one potentiality for a future koyeb source.

That would be awesome! Is there any roadmap, any plan in which the team commits to implementing such plugin system?

I know that something missing are the tests (unit and integration). Can you point me to some code that you deem a good example for these? What else is missing?

I will help out with this post completion of the aforementioned checklist.

Thanks! And again, sorry for missing it.

Regarding our checklist-

(Just considering the current PR for websocket source) Would you/your company be willing to commit to supporting this integration after it has been included?

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

We may reach out with follow-up queries in the coming days.

I am looking forward for them!

Thanks again!

Jul 06 '23 09:07 torrefatto

Sorry! I totally overlooked that one.

No worries! There is no realistic way to make sure everyone is aware of it.

The protocol is the same (wss:). The fact is that we would like to conflate this with a functionality to retrieve the parameters needed to properly tail the logs, with this component. We index our log sources with an opaque id, and the API requires to specify such id, but we have another API to retrieve that id using an intelligible name. This would address the concern in the check-list you linked above

If the integration can be served with a workaround or more generic component, how painful is this for users?

We think it cloud be pretty painful.

Just hypothesizing here to make sure I understand the pain- would the same thing be accomplishable by having some kind of shell script that queried that other API to retrieve the opaque ID , and generate the vector configuration for the websocket source with it (could use vector generate)? Just brainstorming on the alternatives. I will also raise this internally with the rest of the vector team.

Of course, we would provide an adequate free-of-charge robot account to perform integration tests, if the second koyeb source gets accepted.

Having a means to integration-test the prospective koyeb source would definitely be a requirement if we proceeded with it, so it's great that you're thinking of it already.

Relatedly, we hopefully someday soon will have a plugin system where by some integrations could by maintained outside of the main Vector repo. That could be one potentiality for a future koyeb source.

That would be awesome! Is there any roadmap, any plan in which the team commits to implementing such plugin system?

This has been an aspiration for a while now. It's not tracked on a public roadmap but it is definitely something we want to do, it's mostly a question of how soon. I can say we are not planning on it in the next 3 months.

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

Super! This helps.

Jul 06 '23 21:07 neuronull

Hi @torrefatto !

We had an internal discussion on this and wanted to lay out some more topics to consider:

We are curious at a high level, about the general use case for Vector in a customer's pipeline with the Koyeb product, if you could describe that in further detail, it would be helpful. Do you have users (or is it just in anticipation) who want to forward data to other systems? It seems like there are a couple of limitations with the TailLogs API and Vector: Firstly, correct me if I'm wrong but it looks like this solution would only be able to pull live data from when it starts, and not any historic data. Would that be sufficient for your users? Secondly, it seems that in a single users pipeline, they would only be able to use a single vector instance to handle the entire volume of log events, is that something of concern?
Regarding a koyeb source- our stance is that if we did not accept a koyeb source, that the websocket source's viability is a bit diminished (as there has not been much demand otherwise for a websocket source). This means the two scenarios would likely be- just websocket source (if there was confidence your users would be able to utilize it) , or both sources. For Vector it would be ideal to just have the websocket source, so we would just want to ensure there are definite reasons that Koyeb users would not be able to properly utilize it alone. Essentially we'd need to come to a conclusion on this point before proceeding.

Thanks!

Jul 13 '23 17:07 neuronull

Hi @torrefatto !

Hi! Sorry for taking so long to reply. The heat wave hit hard :hot_face:

We had an internal discussion on this and wanted to lay out some more topics to consider:

We are curious at a high level, about the general use case for Vector in a customer's pipeline with the Koyeb product, if you could describe that in further detail, it would be helpful. Do you have users (or is it just in anticipation) who want to forward data to other systems?

We run a serverless platform: we basically let people run containers on our servers. We provide a lightweight way for them to retrieve their logs via our control panel but we do not have the capacity to index them on Koyeb nor provide a great user interface to query them. That is why we want to allow users to forward their logs to more specialed third-parties (e.g. Datadog, Splunk, Elasticsearch). We had users request us the possibility to forward our logs to external systems! This is a real business case for us.

It seems like there are a couple of limitations with the TailLogs API and Vector: Firstly, correct me if I'm wrong but it looks like this solution would only be able to pull live data from when it starts, and not any historic data. Would that be sufficient for your users?

Our systems hold a backlog of the whole data and we expose in the API a start parameter that allows the caller to specify the starting point in time of the tailing. If no start value is specified, our systems reply with the last 1000 entries and then begin streaming. This is one of the downsides of a pure websocket source for us: every time vector restarts it pulls a possibly overlapping set of entries.

Secondly, it seems that in a single users pipeline, they would only be able to use a single vector instance to handle the entire volume of log events, is that something of concern?

This touches another part of why we would like to also include a koyeb source, together with a pure websocket one. The workloads our users can deploy are enclosed in single instances (Firecracker microVMs on our workers), we call it a deployment. Different replicas of the same deployment form a service. Different services are grouped into an app. Finally, a user might be part of different accounts. We would like to allow the user to either:

Deploy one vector instance at any level of the hierarchy they want
Choose to forward a whole account logs with vector (either distributing the load somehow on more than one vector instance or with just one single instance, it's yet up to discussion)

This would really require something more elaborate than the websoket source, because with this the burden would be on the user to retrieve the right identifier for each deployment/service/app and they would not be dynamic.

Regarding a koyeb source- our stance is that if we did not accept a koyeb source, that the websocket source's viability is a bit diminished (as there has not been much demand otherwise for a websocket source). This means the two scenarios would likely be- just websocket source (if there was confidence your users would be able to utilize it) , or both sources. For Vector it would be ideal to just have the websocket source, so we would just want to ensure there are definite reasons that Koyeb users would not be able to properly utilize it alone. Essentially we'd need to come to a conclusion on this point before proceeding.

I get your point, but I might also add that our API is really not much more than a proxy for Loki, that we use internally. You might consider that this websocket source, together with a koyeb source, would enable a loki source. We might be willing to contribute to that as well.

Again, sorry for the late reply. Let me know what do you think of the picture I outlined.

Thanks again!

Jul 19 '23 13:07 torrefatto

Hey @torrefatto ! Thanks for providing all those details, that really helps us frame it, and also helps us understand better the value of a koyeb source.

I might also add that our API is really not much more than a proxy for Loki, that we use internally. You might consider that this websocket source, together with a koyeb source, would enable a loki source. We might be willing to contribute to that as well.

This is an interesting development. We have had a solid demand for a loki source from the community (https://github.com/vectordotdev/vector/issues/6873) So is your API essentially using Loki behind the scenes?

Would that mean the koyeb source would essentially be a wrapper on top of a loki and websocket source?

In the meantime, I will share these new details with the team. Thanks!

Jul 25 '23 15:07 neuronull

One other thing to follow up on:

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

Curious if there was any traction on a commitment at the company-level, to maintain this/these sources?

Jul 25 '23 22:07 neuronull

Hi @neuronull!

Would that mean the koyeb source would essentially be a wrapper on top of a loki and websocket source?

Exactly!

One other thing to follow up on:

I for sure can commit to maintain the websocket source, independently of my work engagement with koyeb. I think (but I have >> to consult my employer) that koyeb too as a company could be willing to commit to the maintenance of this source.

Curious if there was any traction on a commitment at the company-level, to maintain this/these sources?

I talked with @bchatelard and he confirmed that @koyeb is willing to commit to maintain these sources, were them be accepted upstream :muscle:

Jul 27 '23 08:07 torrefatto

Hi @torrefatto , wanted to convey an update- we are still finalizing input from stakeholders but we're pretty confident that we would accept this new source, and the following ones. 🎉

I'll be taking a look at your code in this PR for some initial feedback.

Aug 01 '23 20:08 neuronull

That's awesome @neuronull!

I see that I have a conflict. Would you like me to rebase or merge from master?

Aug 02 '23 16:08 torrefatto

That's awesome @neuronull!

I see that I have a conflict. Would you like me to rebase or merge from master?

Merging is preferred to keep the commit history and make reviews easier (reviewers can just review new changes). When the PR merges it'll be squashed down to one commit.

Aug 02 '23 17:08 jszwedko

In the same vein, avoiding force-pushing is greatly appreciated 🙏

Aug 02 '23 17:08 neuronull

Hi! I just wanted to say that I am working on the PR. Thanks for the useful comments, @neuronull!

@jszwedko don't worry, I won't force-push :pinky-promise:

Aug 09 '23 09:08 torrefatto

Hi team! Sorry for being silent for so long. The summer took quite some time away :palm_tree: :tropical_drink: I resumed working on this, I expect to push new commits soon!

Oct 05 '23 06:10 torrefatto

very excited about the generic websocket source which would be amazing to have (esp if it handles reconnecting and such properly) <3

Oct 07 '23 09:10 OmarTraderXBT

Hi team! Sorry for being silent for so long. The summer took quite some time away 🌴 🍹 I resumed working on this, I expect to push new commits soon!

Hi @torrefatto! Wondering if you are still planning to work on this? We would also love to see this source added!

Jun 13 '24 03:06 yalinglee

Hi @yalinglee

Apologies for the long silence and thanks for reanimating this conversation.

I am still willing to work on this, but I am unfortunately not able to do so during working hours anymore (priorities changed at $DAYJOB).

I have to fit this in my scarce free time. The first thing that I need to do is to update this PR with the upstream changed that have happened so far. Then I can return applying the recommendations here :)

I will try to update you by the end of next week.

Jun 13 '24 16:06 torrefatto

@torrefatto That's totally understandable! I was just curious about the status of this PR so no pressure! And really appreciate you using your precious free time to work on this!

Jun 15 '24 04:06 yalinglee

Thank you for your contribution to Vector! To keep the repository tidy and focused, we are closing this PR due to inactivity. We greatly appreciate the time and effort you've put into this PR.If you'd like to continue working on it, we encourage you to re-open the PR and we would be delighted to review it again. Before re-opening, please use git merge origin master to resolve any conflicts with origin/master.

Jan 27 '25 22:01 pront

vector vector copied to clipboard

feat(new source): add initial `websocket` source

✅ Deploy Preview for vrl-playground canceled.

✅ Deploy Preview for vector-project canceled.

vector
vector copied to clipboard

Deploy Preview for vrl-playground canceled.

Deploy Preview for vector-project canceled.