provenance icon indicating copy to clipboard operation
provenance copied to clipboard

ChainedRepo with ArchivedFile

Open burbma opened this issue 7 years ago • 10 comments

ArchivedFile has a method, abspath(), that returns the path to the blob that is the file so it can, for example, be read. It's definition found here:

def abspath(self):
    repo = repos.get_default_repo()
    path = repo.blobstore._filename(self.blob_id)
    return os.path.abspath(path)

My default_repo is a ChainedRepo so when repo.blobstore is called while getting path an AttributeError is thrown because a ChainedRepo doesn't have a blobstore. Instead it has stores which is a list of the blobstore's that are chained. Here's my debug session to show some of that:

ipdb> repo
<provenance.repos.ChainedRepo object at 0x111759898>
ipdb> repo.stores
[<provenance.repos.PostgresRepo object at 0x118c53ac8>, <provenance.repos.PostgresRepo object at 0x1119502e8>]
ipdb> repo.stores[1]
<provenance.repos.PostgresRepo object at 0x1119502e8>
ipdb> repo.stores[1].blobstore
<provenance.sftp.SFTPStore object at 0x1189230b8>
ipdb> repo.stores[0].blobstore
<provenance.blobstores.DiskStore object at 0x118923080>
ipdb> repo.stores[0].blobstore._filename(self.blob_id)
'/Users/.../blobstore/e86d496122b230f2d4ebaa3e9bdb9371cf9486c4'
ipdb> repo.stores[1].blobstore._filename(self.blob_id)
'/Users/.../blobstore/e86d496122b230f2d4ebaa3e9bdb9371cf9486c4'

Thoughts? Is it a bug or an all too common user error?

burbma avatar Mar 24 '17 21:03 burbma

Looks like a problem with the implementation/design. I'll have to look more closely how abspath is being used... but I think the end solution will probably adding a _filename method onto a repo. Basically it will have to see if it has a disk blobstore and then delegate to to it's _filename. Or maybe a S3 blobstore would work as well.. these are the things that needs to be considered. But the chained repo would then have to iterate and find the diskstore. Kinda messy but I think it would be best to put that logic in the repos rather than have artifact file try to figure everything out.

bmabey avatar Mar 25 '17 03:03 bmabey

I pushed a fix for this.. I think. I didn't write an automated test or even test it manually but it should work. :) Let me know if it solves your problem.

bmabey avatar Mar 25 '17 22:03 bmabey

I haven't been able to figure out why yet, but repo._filename(self.blob_id) is returning None.

=== EIN IPython Debugger ===
ipdb> > /Users/.../python3.5/posixpath.py(64)isabs()
     62     """Test whether a path is absolute"""
     63     sep = _get_sep(s)
---> 64     return s.startswith(sep)
     65 
     66 

ipdb> up
> /Users/.../python3.5/posixpath.py(358)abspath()
    356 def abspath(path):
    357     """Return an absolute path."""
--> 358     if not isabs(path):
    359         if isinstance(path, bytes):
    360             cwd = os.getcwdb()

ipdb> up
> /Users/.../python3.5/site-packages/provenance/core.py(506)abspath()
    504         repo = repos.get_default_repo()
    505         path = repo._filename(self.blob_id)
--> 506         return os.path.abspath(path)
    507 
    508     def __fspath__(self):

ipdb> print(repo)
<provenance.repos.ChainedRepo object at 0x10d8143c8>
ipdb> print(self.blob_id)
/Users/.../blobstore/e86d496122b230f2d4ebaa3e9bdb9371cf9486c4
ipdb> print(path)
None

burbma avatar Mar 27 '17 15:03 burbma

Did you step into the repo._filename(self.blob_id) call to see why?

bmabey avatar Mar 27 '17 21:03 bmabey

Alright, after much pain I figured out how to use the debugger to do it.

> /Users/.../python3.5/site-packages/provenance/core.py(505)abspath()
-> path = repo._filename(self.blob_id)
(Pdb) step
--Call--
> /Users/.../python3.5/site-packages/provenance/repos.py(875)_filename()
-> def _filename(self, id):
(Pdb) next
> /Users/.../python3.5/site-packages/provenance/repos.py(876)_filename()
-> return cs.chained_filename(self, id)
(Pdb) step
--Call--
> /Users/.../python3.5/site-packages/provenance/_commonstore.py(144)chained_filename()
-> def chained_filename(chained, id):
(Pdb) id
'/Users/.../blobstore/e86d496122b230f2d4ebaa3e9bdb9371cf9486c4'
(Pdb) chained
<provenance.repos.ChainedRepo object at 0x10b7576d8>
(Pdb) next
> /Users/.../python3.5/site-packages/provenance/_commonstore.py(145)chained_filename()
-> if id in chained.stores:
(Pdb) chained.stores
[<provenance.repos.PostgresRepo object at 0x10b2d7ba8>, <provenance.repos.PostgresRepo object at 0x10b74e2e8>]

So in chained_filename we have if id in chained.stores. id is the path to the blob on my disk while chained.stores is a list of repos. I don't think id will ever be in chained.stores.

burbma avatar Mar 28 '17 17:03 burbma

Ah, okay, you are right. The .stores is the problem. I pushed a fix to master.

bmabey avatar Mar 28 '17 17:03 bmabey

BTW, you can use 's' and 'n' as shortcuts for 'step' and 'n'. #ProTip

bmabey avatar Mar 28 '17 17:03 bmabey

Same problem.

(Pdb) break /Users/.../python3.5/site-packages/provenance/core.py:505
Breakpoint 1 at /Users/.../python3.5/site-packages/provenance/core.py:505
(Pdb) c
> /Users/.../python3.5/site-packages/provenance/core.py(505)abspath()
-> path = repo._filename(self.blob_id)
(Pdb) s
--Call--
> /Users/.../python3.5/site-packages/provenance/repos.py(875)_filename()
-> def _filename(self, id):
(Pdb) n
> /Users/.../python3.5/site-packages/provenance/repos.py(876)_filename()
-> return cs.chained_filename(self, id)
(Pdb) s
--Call--
> /Users/.../python3.5/site-packages/provenance/_commonstore.py(144)chained_filename()
-> def chained_filename(chained, id):
(Pdb) n
> /Users/.../python3.5/site-packages/provenance/_commonstore.py(145)chained_filename()
-> if id in chained:
(Pdb) id
'/Users/.../blobstore/e86d496122b230f2d4ebaa3e9bdb9371cf9486c4'
(Pdb) chained
<provenance.repos.ChainedRepo object at 0x111851f28>

Similar to last time, it checks if id in chained, id is a string (path to file), chained is a repo object. If I'm not mistaken it's saying, "is this string in this object? No." So it still returns None.

burbma avatar Mar 28 '17 18:03 burbma

I'm a bit confused about why id is what it is. This all starts when I call proxy.abspath(). But id seems to be the very thing that I want returned. Before I started using a chained repo I noticed that abspath() and blob_id were the same. Is that supposed to be the case?

burbma avatar Mar 28 '17 18:03 burbma

Before I started using a chained repo I noticed that abspath() and blob_id were the same. Is that supposed to be the case?

No, id should be the artifact id, so a hash of the contents. If it is the path of the file then something else must be wrong.

bmabey avatar Mar 28 '17 18:03 bmabey