vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Support feeding from Spark

Open lesters opened this issue 6 years ago • 6 comments

Today, the Hadoop integration tools for Vespa support Hadoop and Pig for feeding and querying Vespa. The Pig feeder is a thin wrapper around the Vespa HTTP client.

We should support feeding directly from Spark as well, to avoid Spark pipelines having to write to HDFS and run another Pig job for the actual feeding. Similarly to the Pig feeder, this could be implemented as a thin wrapper around the HTTP client.

lesters avatar Apr 23 '19 09:04 lesters

@kkraune i dont see Hadoop integration anymore. do we want to have Spark Support. I would be interested in taking it up.

prasad-marne avatar Oct 09 '23 06:10 prasad-marne

Hi, yes that would be a great addition! A good starting point is https://docs.vespa.ai/en/vespa-feed-client.html. Thanks!

kkraune avatar Oct 09 '23 06:10 kkraune

Great. Will spend some time to investigate and see how we can design a sink in Spark

prasad-marne avatar Oct 10 '23 09:10 prasad-marne

can I take this issue ?

tsafacjo avatar Nov 01 '23 21:11 tsafacjo

Sure, thanks for contributing! https://github.com/vespa-engine/vespa/blob/master/CONTRIBUTING.md is a good place to start

kkraune avatar Nov 02 '23 09:11 kkraune