jgit-spark-connector icon indicating copy to clipboard operation
jgit-spark-connector copied to clipboard

[feature-request] create an engine playground

Open campoy opened this issue 7 years ago • 9 comments

The best way to get people to try your technology is to reduce time to first "whoa" moment. In order to do so @eiso created a Dockerfile that allows you to run the engine in a very straightforward way. The only problem I see with this is we require people to install docker and download a pretty large image.

I'd like to create an Engine Playground (à la play.golang.org) that will provide shells into an engine instance running on one of our projects.

advantages:

  • reduced time and friction for users to try our technology
  • hides complexity of setup until it's necessary
  • more inclusive to non-devops
  • can easily track impact visitors/time etc

concerns

  • abuse: can it be used for bitcoin mining?
  • security: can it be use to access resources that should be secret?
  • privacy: could people somehow leave PII we don't want?
  • others: could people upload illegal content to these servers?
  • money: we need to pay for this, obviously

campoy avatar Nov 21 '17 13:11 campoy

We're half way there. Right now @rporres is able to do this, and we use them (they are based on the current Dockerfile at the root of this repo). However we manually provision these to people who request them, see here

Since they are provisioned on demand (since we use a new GCP instance per user), we would need a small frontend like Docker.com does with their trial and automated provisioning (should be relatively straight forward to do):

screenshot-2017-11-21-16 45 25

I'd like to create an Engine Playground (à la play.golang.org) that will provide shells into an engine instance running on one of our projects.

Want to make something more fancy or use the Jupyter notebooks?

/cc @marnovo @mcuadros

eiso avatar Nov 21 '17 15:11 eiso

Another alternative is having cached data for a (limited) set of queries, this way, we could 'mock' an engine environment.

eiso avatar Nov 21 '17 15:11 eiso

To learn more about how @src-d/infrastructure does it today: https://github.com/src-d/infrastructure/tree/master/engine-jupyter-demo

eiso avatar Nov 21 '17 17:11 eiso

I doubt the current approach can scale properly... Most likely we would need to use something like Jupyter Hub to manage a multiuser setup...

rporres avatar Nov 21 '17 17:11 rporres

From what I understand about Spark we'd need to also move away from Derby (https://github.com/src-d/engine/issues/192) to allow multiple Spark sessions and a we'd need heavy caching across all sessions.

eiso avatar Nov 21 '17 17:11 eiso

Hosted playground is a nice idea, esp as simple Jupyter UI with Python or Scala can be exposed.

BTW if we are using a container-per-user model, why would multiple Spark sessions be needed?

bzz avatar Nov 22 '17 15:11 bzz

@bzz because of @rporres's comment of how feasible it is to scale with individual instances (not containers) per user.

@rporres since they are temp containers, could we do something like launch a container per user not an instance and have them time-out after x hours of idleness? Just a simple bash script that checks the logs of the container and kills the container if a certain command was not used for x hours?

eiso avatar Nov 22 '17 17:11 eiso

Is this still being worked on or can we close the issue?

erizocosmico avatar Mar 20 '18 15:03 erizocosmico

This will open up again once gitbase is ready, right now it's on pause till performance is where it needs to be.

eiso avatar Mar 21 '18 12:03 eiso