jupyterlab-google-drive icon indicating copy to clipboard operation
jupyterlab-google-drive copied to clipboard

Deprecation of the Google Realtime API

Open jochym opened this issue 6 years ago • 45 comments

This is what I have got a minute ago. Putting an issue out as a starting point for discussion. What now? - the google-drive backed notebooks in the Lab are really fantastic collaboration tool. Is there any similar technology we can migrate to?


Hello Google Realtime API developer,

We’re writing to let you know that after careful consideration, we are deprecating the Google Realtime API as of November 28, 2017.

Your Realtime API client applications will continue to work normally until December 11, 2018. To ensure continued availability after this date, please migrate your applications using Realtime API to another data store before December 11, 2018. You can read more about the deprecation in our Realtime API deprecation documentation.

We know developers have come to rely on Realtime API and that migration may be a significant effort. We are grateful to our developers, and we hope that the deprecation plan summarized below allows a smooth transition for you and your product(s).

jochym avatar Nov 30 '17 19:11 jochym

Yes, I am as surprised as you are! This is unfortunate news, and we will have to deal with it.

Collaborative notebooks are still a priority for us. The good news is that we have already been working on our own solution for real-time collaboration that does not depend on the Google Realtime API. Unfortunately, it is still uncertain how long that will take, and what it will ultimately look like.

It is important to note that the Google Drive API is still supported, so we will still be able to upload, share, and access notebooks using Google Drive, it's just the realtime capabilities that will go away (until we get our own solution running).

ian-r-rose avatar Nov 30 '17 20:11 ian-r-rose

Without real-time features this will be just a shadow of the current tool. Well, it is better to know a year in advance, I guess. We (developers/community) have our goal for the next year picked for us :-/ As a new user of the lab-with-realtime-colab I can just assure the developers that this is a fantastic tool and it is really appreciated by my fellow users with particular emphasis to the realtime collaboration part.

jochym avatar Nov 30 '17 20:11 jochym

Glad to hear you are finding it useful, hopefully the transition to the new approach is not too bumpy.

ian-r-rose avatar Dec 01 '17 00:12 ian-r-rose

Is Firebase an option? Real-time JSON datastore with access to any level of the tree. Also notifies and broadcasts updates.

https://firebase.google.com/

cedricyau avatar Dec 01 '17 00:12 cedricyau

The main problem with Firebase, as far as I know, is that it does not have the sophisticated conflict resolution algorithms (like operational transforms or CRDTs) that are required for collaborative text editing. At least, that has been my reading of the documentation. If I am mistaken in that, please do let me know!

ian-r-rose avatar Dec 01 '17 01:12 ian-r-rose

Coincidentally, I followed the links on Realtime API warning to use Cloud Firestore and it went to:

https://firebase.google.com/products/firestore/

Will need to read up more on Firebase and Firestore. Seems like the latter is the newer service and may be more applicable.

cedricyau avatar Dec 01 '17 01:12 cedricyau

It seems that unlike Google Drive/Realtime, Cloud Firestore requires a Google API account for billing.

What would be required to at least get real-time running on the same jupyterlab server?

Suppose you and I connected to the same server and opened the same notebook. What would it take for the server to broadcast changes between the two clients?

JupyterLab already allows multiple views into a notebook from within a browser window--so it seem possible for a server hosted file to have it's changes broadcast.

Also, this would allow for the sharing of a kernel.

cedricyau avatar Dec 08 '17 14:12 cedricyau

So far as I have been able to tell, neither Cloud Firestore nor Firebase provide the conflict-resolution capabilities that we need (in addition to being paid products).

We are currently working on a solution that is hosted with the notebook server. At the moment, the server has no APIs for broadcasting changes and resolving conflicts, which is why it is a tricky problem. The case with multiple views in the same browser window works because they are both stored in memory, and the user is not able to make concurrent, conflicting edits.

The intention is that with the self-hosted solution we are working on kernels could be shared.

ian-r-rose avatar Dec 08 '17 15:12 ian-r-rose

Firepad is using firebase for the same purpose. https://github.com/firebase/firepad

eyadsibai avatar Dec 11 '17 21:12 eyadsibai

I believe that they have their own operational transform implementation that they use (rather than something built into Firepad) https://github.com/firebase/firepad/blob/master/lib/text-operation.js

ian-r-rose avatar Dec 11 '17 21:12 ian-r-rose

what about https://github.com/P2Pvalue/swellrt?

eyadsibai avatar Dec 11 '17 22:12 eyadsibai

Look at the Atom package teletype (MIT license) crdt implementation. They cite three papers for their crdt implementation. ( papers are pay walled )

teletype-crdt: The string-wise sequence CRDT that enables peer-to-peer collaborative editing.
teletype-server: The server-side application that facilitates peer discovery.
teletype-client: The editor-agnostic library that manages the interaction with other clients.

https://github.com/atom/teletype
https://github.com/atom/teletype-crdt https://github.com/atom/teletype-client https://github.com/atom/teletype-server

Rstudio (paid version) and Cloud9 enable collaborative editing using Ace (switched to BSD license).
https://github.com/ajaxorg/ace

VSCode (MIT license) added collaborative editing to the insiders edition in November. https://github.com/Microsoft/vscode

stoneyv avatar Dec 18 '17 04:12 stoneyv

There's also etherpad's changeset library + server which work well

piec avatar Dec 20 '17 13:12 piec

Hey guys! Check out this post for some options: https://www.quora.com/What-are-good-frameworks-for-real-time-collaboration-in-a-web-application

And since you guys seem to be comfortable with proprietary solutions, I must humbly suggest http://convergence.io as well. It really is the quickest, most reliable path to realtime collaboration.

alalonde avatar Feb 14 '18 15:02 alalonde

Just saw this today. Super bummed. I was going to introduce JupyterLab+Google-Drive-Extension to the Intro to Python class I am teaching, but I don't want show anything that will stop working at the end of the year. As for a replacement, I second @stoneyv's suggestion: Atom TeleType. I already use Atom to run Python/R scripts using nteract's Hydrogen plugin: https://nteract.io/atom https://atom.io/packages/hydrogen Hydrogen uses Jupyter kernels for in-line output, code completion, documentation etc.: https://nteract.io/kernels Atom also has excellent git integration (maybe because it is made by GitHub 😀). If Atom TeleType-like collaboration can be implemented into JupyterLab, that would be awesome!

marskar avatar Feb 18 '18 23:02 marskar

There is another interesting tool http://codestrates.org/ which is based on ShareDB

Codestrates is a literate computing approach to developing interactive software inspired by interactive notebooks such as Jupyter notebook. However, in Codestrates, real-time collaboration is built in, it is possible to create stand-alone applications with persistent state, and to reprogram the functionality of the environment it self.

There is introduction video on their site and demo where you can create a codestrate and play with it.

stas-sl avatar Feb 22 '18 10:02 stas-sl

https://github.com/share/sharedb

ShareDB is a realtime database backend based on Operational Transformation (OT) of JSON documents.

  • Realtime synchronization of any JSON document
  • Concurrent multi-user collaboration
  • Synchronous editing API with asynchronous eventual consistency
  • Realtime query subscriptions
  • Simple integration with any database - MongoDB, PostgresQL (experimental)
  • Horizontally scalable with pub/sub integration
  • Projections to select desired fields from documents and operations
  • Middleware for implementing access control and custom extensions
  • Ideal for use in browsers or on the server
  • Reconnection of document and query subscriptions
  • Offline change syncing upon reconnection
  • In-memory implementations of database and pub/sub for unit testing

manigandham avatar Feb 23 '18 04:02 manigandham

Hi,

it's unfortunate that google is deprecating it's API. Have you guys been able to work out a solution? I ask because I am a frequent jupyter user and the possibility of a collaboratory platform for notebooks was surreal for me. How long could it take for moving completely off the Google realtime API?

justachetan avatar Feb 24 '18 19:02 justachetan

So I just watched https://channel9.msdn.com/Events/PyData/Seattle2017/BRK11 and was interested in the real time collaboration. First let me say THIS IS A HARD PROBLEM. The work that was done in the video and thus the reason for this thread seems to be in peril due to the deprecation of the real time API. This is unfortunate, but I was coming here to say that I was hoping to drive some conversation around an API that would allow real time without an external connection to google drive. I am hoping for some thought to a self contained API that an organization can host 100% internally. This would help adoption in classified networks, or other areas where things mush be 100% self container and not reliant on a cloud service like google drive. With the deprecation of the API, perhaps we now have an opportunity to consider this use case when pushing the next iteration of real time collaboration.

JohnOmernik avatar Feb 27 '18 15:02 JohnOmernik

John, we at Convergence Labs have been building realtime collaborative apps for over a decade, and we've essentially built that API. It is indeed a hard problem. We've run into the vast majority of problems people tend to face over the years, and have wrapped up the solutions into one product. There is the requisite support for data synchronization, but also first-class support for things like remote cursors and selections. We additionally have an on-premise solution for organizations needing to keep their data.

Ian, we've done extensive consulting with software companies getting their feet wet in realtime collaboration, and in the interest of moving forward the state of the art in realtime collaboration apps, we'd be happy to have a conversation about your working solution. Jupyter Labs is one of those cutting-edge apps that we'd love to see succeed, regardless of the underlying technology being used.

alalonde avatar Feb 27 '18 20:02 alalonde

I think it'd be important to have an OSS solution for the realtime API work. Then it can't be withdrawn without recourse. It'd also be much easier to install in many environments (many organizations won't want their data outside their organization, and so they'll want to be able to able to do a local install).

david-a-wheeler avatar Mar 08 '18 18:03 david-a-wheeler

Yes, I would also prefer some open source, decentralized solution. Maybe something based on WebRTC, like Atom Teletype or this project: https://github.com/Chat-Wane/CRATE . There are also people working on collaborative editing on top of IPFS, https://ipfs.io/blog/30-js-ipfs-crdts.md

aweisse avatar Mar 15 '18 10:03 aweisse

Jupyter uses a central server (jupyter notebook server) so I don't see the interest of a decentralized solution for real-time collaboration.

piec avatar Mar 15 '18 13:03 piec

@piec: I guess it was about self-hosted solution. Not centralized at some particular service provider (google)

jochym avatar Mar 15 '18 17:03 jochym

@jochym @piec Well, isn't the standard scenario that people are running their own notebook server locally on the machine they are sitting in front of? Does the standard notebook server have a notion of multi-user, or wasn't that the purpose of jupyter hub?

I imagine that I can send someone a link or hash or whatever, and then our two notebook servers connect and we can edit documents together. Like in the Atom editor with Teletype plugin (where Atom as an electron app is basically a web server + browser combined into a desktop program).

aweisse avatar Mar 15 '18 18:03 aweisse

@aweisse : I do not think there is really "standard scenario". People use JLab in so many ways. I hardly use it on my laptop - most of my time it is on my dept. server over jhub. I am quite sure there are multiple other schemes.

jochym avatar Mar 15 '18 18:03 jochym

All: @stas-sl and @manigandham mentioned ShareDB, and just wondering why that's not in the mix.

I'm using an older version in pithy (https://github.com/pithy/dansteingart) and for what I need it to do it's solid. Notebooks are more complex for sure, but ShareDB seems built to handle OT for json docs. Just curious why it's not discussed more here?

Thanks in advance.

dansteingart avatar Apr 09 '18 17:04 dansteingart

@ian-r-rose do you have any links to projects/proposals for the work you mentioned that is happening with respect to in house real-time implementations? I agree that an implementation on the Jupyter server is the route to take, as it has a similar job to the ContentsManager. Potentially even an opt-in API for ContentsManager implementations?

Either way I would love to see how it is progressing/help if I can and so some direction to where that progress is happening would be much appreciated. Thanks!

SpencerPark avatar May 08 '18 19:05 SpencerPark

@ian-r-rose : same as @SpencerPark

@dansteingart : I don't know much about the subject, but ShareDB does look like a good candidate. However, I can't find a lot of information on how it implements conflict resolution -> any idea where explanations can be found ?

Ericvulpi avatar May 10 '18 10:05 Ericvulpi

The current work in phosphor in the feature-tables branch is going to provide a CRDT implememtation for real-time collaboration. We hope to start refactoring some of the core JupyterLab APIs to take advantage of this at some point in the next few weeks.

@Ericvulpi ShareDB uses operational transforms (similar to Google Drive, different from CRDTs) to perform its conflict resolution. I investigated it about a year ago, but found the quality of documentation to be so spotty that I was unable to make much progress.

ian-r-rose avatar May 10 '18 15:05 ian-r-rose