notebook icon indicating copy to clipboard operation
notebook copied to clipboard

Help wanted - Running jupyter notebook in Google "Drive File Stream" folder

Open matthew-hsr opened this issue 6 years ago • 12 comments

Hello! I have long been using jupyter notebook inside my Google Drive and it worked perfectly. Thanks a lot for all your hard work! Recently I was forced to switch to the newer version "Drive File Stream". I made my folder "Available offline", which should mean that I saved the folder in my local storage space. When I start a jupyter notebook, it runs fine, but trying to "Save and Checkpoint" gives me an error "Checkpoint failed". Seems like it saves the file properly somehow but returned an error.

I started the jupyter notebook by running "jupyter notebook" in "anaconda prompt". The following error is obtained when I try to "Save and Checkpoint". Seems like it's complaining that the saved files are the same - "shutil.SameFileError" which is really weird, as I definitely changed the notebook by having an additional "Save Attempt x" to help me tell if it saved. I also tried to delete the corresponding check point in the folder ".ipynb_checkpoints" and do "Save and Checkpoint", but I still get the same error...

Any help is highly appreciated!

[I 10:49:44.573 NotebookApp] Saving file at /Writing_MVUE_2d.ipynb
[E 10:49:44.690 NotebookApp] Unhandled error in API request
    Traceback (most recent call last):
      File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\base\handlers.py", line 516, in wrapper
        result = yield gen.maybe_future(method(self, *args, **kwargs))
      File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\gen.py", line 1015, in run
        value = future.result()
      File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\concurrent.py", line 237, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\gen.py", line 285, in wrapper
        yielded = next(result)
      File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\services\contents\handlers.py", line 278, in post
        checkpoint = yield gen.maybe_future(cm.create_checkpoint(path))
      File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\services\contents\manager.py", line 468, in create_checkpoint
        return self.checkpoints.create_checkpoint(self, path)
      File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\services\contents\filecheckpoints.py", line 56, in create_checkpoint
        self._copy(src_path, dest_path)
      File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\services\contents\fileio.py", line 241, in _copy
        copy2_safe(src, dest, log=self.log)
      File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\services\contents\fileio.py", line 51, in copy2_safe
        shutil.copyfile(src, dst)
      File "C:\ProgramData\Anaconda3\lib\shutil.py", line 98, in copyfile
        raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
    shutil.SameFileError: 'G:\\My Drive\\Research\\Camera_Project\\Writing_MVUE_2d.ipynb' and 'G:\\My Drive\\Research\\Camera_Project\\.ipynb_checkpoints\\Writing_MVUE_2d-checkpoint.ipynb' are the same file
[E 10:49:44.698 NotebookApp] {
      "Cache-Control": "no-cache",
      "Cookie": "username-localhost-8891=\"2|1:0|10:1525749572|23:username-localhost-8891|44:ZGE5NjFkMTgwOTI3NDFjMWJhMzY1NDIwNjhlODk5ODA=|52cc6ad64c261ccd78d2f66a0c14a95edc8468e7e4ac9c27770ce2bbebd8cd55\"; _xsrf=2|f34c78bf|7c7913196386a495e7fb6a7c3c7bbf7e|1525206477; username-localhost-8890=\"2|1:0|10:1525723124|23:username-localhost-8890|44:MGFhNmE2ZjMxZmFmNDU0M2IyMTgzNWE2NDdmMmRhMDc=|73c556d569c5f55b60a18dd36f06850f90c89c2dc329e11a4041a6c5ede7cb38\"; username-localhost-8889=\"2|1:0|10:1526327000|23:username-localhost-8889|44:YjdmMTExODQyNjNmNGM5MWIzZmM0OWIzYjE3NDgzZmM=|43a55e40b88869d8e054546feca0fbf7efae9c993b450311f5046f4c0ba45a66\"; username-localhost-8888=\"2|1:0|10:1526395222|23:username-localhost-8888|44:ZDJmZWNjZWUzZDJjNDVmMjkwMTZhNWZmNTI1MjhhMTE=|9cd671557f9b67b203088e3852aaf37de56ad7c8ad45f26798bc3e0d47a49c43\"",
      "Referer": "http://localhost:8888/notebooks/Writing_MVUE_2d.ipynb",
      "X-Xsrftoken": "2|f34c78bf|7c7913196386a495e7fb6a7c3c7bbf7e|1525206477",
      "Origin": "http://localhost:8888",
      "Content-Length": "0",
      "Connection": "Keep-Alive",
      "Accept-Encoding": "gzip, deflate",
      "Accept": "application/json, text/javascript, */*; q=0.01",
      "Accept-Language": "en-US",
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299",
      "X-Requested-With": "XMLHttpRequest",
      "Host": "localhost:8888"
    }
[E 10:49:44.698 NotebookApp] 500 POST /api/contents/Writing_MVUE_2d.ipynb/checkpoints (::1) 12.00ms referer=http://localhost:8888/notebooks/Writing_MVUE_2d.ipynb

matthew-hsr avatar May 15 '18 14:05 matthew-hsr

same

kfuka avatar May 18 '18 04:05 kfuka

I am experiencing the same problem. Any update on a potential solution?

TiesdeKok avatar Jun 12 '18 22:06 TiesdeKok

I installed "Google Backup & Sync" in addition to "Google drive file stream". Backup & Sync works like Google drive.

kfuka avatar Jun 13 '18 02:06 kfuka

Same problem here! I'm suspecting that this problem is similar to https://productforums.google.com/forum/#!topic/drive/rnObiWVDo0s;context-place=topicsearchin/drive/category$3Adrive-for-desktop-sync%7Csort:relevance%7Cspell:false

deniz195 avatar Jun 15 '18 08:06 deniz195

I debugged the issue a little bit and found the following solution:

Part 1 - Minimal working example The issue seems to originate from shutil.copyfile, which checks if source and destination file are the same (using os.path.samefile).

It appears that the combination of windows and google drive file stream yields invalid results. Here is the minimum example (assuming G:\My Drive\foo.txt exists, using python 3.6.2):

>>> f1 = 'G:\\My Drive\\foo.txt'
>>> f2 = 'G:\\My Drive\\foo2.txt'
>>> import shutil
>>> shutil.copyfile(f1, f2)
>>> shutil.copyfile(f1, f2)

--> The last line throws the SameFileError although it clearly shouldnt!

Whereas:

>>> f1 = 'G:\\My Drive\\foo.txt'
>>> f3 = 'C:\\Scratch\\foo2.txt'
>>> import shutil
>>> shutil.copyfile(f1, f3)
>>> shutil.copyfile(f1, f3)

--> Throws no error (correct)!

Part 2 - Why? How does this happen?

It turns out that shutil.copyfile uses os.path.samefile to determine if a file is being copied on itself: (from https://github.com/python/cpython/blob/master/Lib/genericpath.py)

# Are two filenames really pointing to the same file?
def samefile(f1, f2):
    """Test whether two pathnames reference the same actual file"""
    s1 = os.stat(f1)
    s2 = os.stat(f2)
    return samestat(s1, s2)

# Are two stat buffers (obtained from stat, fstat or lstat)
# describing the same file?
def samestat(s1, s2):
    """Test whether two stat buffers reference the same file"""
    return (s1.st_ino == s2.st_ino and
            s1.st_dev == s2.st_dev)

Now unfortunately st_ino depends on windows on the file system (https://stackoverflow.com/questions/44158182/meaning-of-st-ino-os-stat-output-in-windows-os) and specifically is st_ino==0 for all files on Google Drive File Stream.

Part 3 - Where to go and who to blame? It seems that this error is an unlucky combination of a lazy filesystem (why not report some kind of uid as the inode?) and a naive os python library (Checking for file identity this way seems not to generalize well... why not check if st_ino==0?)

Part 4 - The dirty fix

  1. Find genericpath.py of your python library:
>>> import os
>>> os.path.genericpath.__file__
'C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Anaconda3_64\\lib\\genericpath.py'
  1. In this file, replace the samestat function with the following patch:
# Are two stat buffers (obtained from stat, fstat or lstat)
# describing the same file?
def samestat(s1, s2):
    """Test whether two stat buffers reference the same file"""
    return (s1.st_ino != 0 and
			s2.st_ino != 0 and
			s1.st_ino == s2.st_ino and
            s1.st_dev == s2.st_dev)
  1. Save the file. Restart python (and/or jupyter).
  2. Be happy and wait until either google or the python fixes this issue properly...

deniz195 avatar Jun 22 '18 00:06 deniz195

@deniz195 that is some impressive detective-style bug hunting!

I can confirm that your proposed solution fixes the problem for me as well.

Thanks!

TiesdeKok avatar Jun 22 '18 00:06 TiesdeKok

See also here: https://bugs.python.org/issue33935

deniz195 avatar Jun 22 '18 00:06 deniz195

Wow thanks a lot @deniz195 , that is some great debugging work. The patch works fine for me as well.

ThomasLecat avatar Jul 11 '18 11:07 ThomasLecat

So the python team knows about it since there is a python bug ticket created. But is there any way to make sure the G Drive team knows about this?

gabefair avatar Jan 05 '19 13:01 gabefair

I also experience this problem in Windows but not with Google's "Drive File Stream". Deniz's fix works for me too in this case though. I do have Google file stream installed and mapped as my G: drive. However, the notebook was saving to my Z: drive, which is a mapping of a network drive that is backed up with Windows' Sync Center; nothing to do with Google's file stream, but a standard Windows feature that triggers the same bug. I read the bug report bugs.python.org/issue33935 that Deniz created and referenced above for this, which goes on to explain that the problem is with Python, although a solution has not yet been found. I've added a comment there to share my additional information.

gazzar avatar Feb 17 '19 10:02 gazzar

Touching base here because I just came across this question as I did some internet sleuthing to get around Drive File Stream (DFS) trouble on Windows 10. I can't find the original links, though, so apologies to those who actually helped to devise these solutions.

To get DFS to work with Jupyter Notebook, run cmd.exe as Administrator, and enter:

mklink /J "C:\Name\To\New\Desired\DFS\Path" "E:\My Drive"

(The command has the format mklink /flag desired-link-path current-path; /J tells it to make a directory junction.)

Win10 saw DFS as another drive (E:, for me) and I couldn't run Jupyter because it kept trying to launch on C:. The above command makes a hard link (vs soft or symbolic link) to make Win10 think DFS is located at a path on C:. I've confirmed that files added to the symlink C:\ path are reflected in DFS, and on my Google Drive on the web.

jkmgeo avatar Mar 28 '19 20:03 jkmgeo

Hi All,

Thank you Jkmgeo, can confirm that the above fix works

Touching base here because I just came across this question as I did some internet sleuthing to get around Drive File Stream (DFS) trouble on Windows 10. I can't find the original links, though, so apologies to those who actually helped to devise these solutions.

To get DFS to work with Jupyter Notebook, run cmd.exe as Administrator, and enter:

mklink /J "C:\Name\To\New\Desired\DFS\Path" "E:\My Drive"

(The command has the format mklink /flag desired-link-path current-path; /J tells it to make a directory junction.)

Win10 saw DFS as another drive (E:, for me) and I couldn't run Jupyter because it kept trying to launch on C:. The above command makes a hard link (vs soft or symbolic link) to make Win10 think DFS is located at a path on C:. I've confirmed that files added to the symlink C:\ path are reflected in DFS, and on my Google Drive on the web.

benphua avatar Oct 16 '19 23:10 benphua