mlem icon indicating copy to clipboard operation
mlem copied to clipboard

load: Issues loading models from DVC managed remote storage

Open igordertigor opened this issue 3 years ago • 13 comments

I'm storing models as part of a DVC pipeline in a research repository (let's call it "research") and I'm trying to load it on a VM inside the production system. In the research repository, I have the following: .mlem.yaml:

core:
  storage:
    type: dvc

.dvc/config:

[core]
    remote = gcloud
['remote "gcloud"']
    url = gs://********************

The model is stored at models/my-model / models/my-model.mlem as part of a DVC pipeline. The .mlem-file is visible in the git-repository, there is also a file named by the hash of the binary file in the bucket.

I'm trying to load the model in the production system with

import mlem

model = mlem.api.load('models/my-model', project='https://github.com/my-company/research')

That gives me the following error:

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    mlem.api.load(f'models/{model}', project=repo)
  File "/home/******/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 99, in load
    meta = load_meta(
  File "/home/******/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 167, in load_meta
    location = Location.resolve(
  File "/home/******/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 103, in resolve
    return UriResolver.resolve(
  File "/home/******/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 142, in resolve
    return cls.find_resolver(path, project, rev, fs).process(
  File "/home/******/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 194, in process
    fs, project = cls.get_fs(project, rev)
  File "/home/******/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 290, in get_fs
    fs, _, (path,) = get_fs_token_paths(
  File "/home/******/.venv/lib/python3.8/site-packages/fsspec/core.py", line 639, in get_fs_token_paths
    fs = cls(**options)
  File "/home/******/.venv/lib/python3.8/site-packages/fsspec/spec.py", line 76, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/home/******/.venv/lib/python3.8/site-packages/fsspec/implementations/github.py", line 52, in __init__
    r.raise_for_status()
  File "/home/******/.venv/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/my-company/research

This is working perfectly fine without dvc (i.e. the model binary is directly committed to git). Hence, this might be related to #47.

igordertigor avatar Nov 21 '22 14:11 igordertigor

Hi @igordertigor! Thanks for reporting! Could you please check the first lines in .mlem file? Does it have type: dvc line like here https://github.com/iterative/example-mlem-get-started/blob/dvc/models/rf.mlem#L5 ? This is the instruction that let MLEM know it needs to work with DVC.

The other question: do you have DVC installed in the env where you run mlem.api.load?

aguschin avatar Nov 23 '22 11:11 aguschin

Hi @aguschin, here is my .mlem.yaml:

$ cat .mlem.yaml
core:
  storage:
    type: dvc

Regarding your second question: I tried with and without DVC installed and with and without the respective remote configured where I run mlem.api.load. Does that answer your question?

igordertigor avatar Nov 24 '22 09:11 igordertigor

@igordertigor, thanks!

In the first question I meant checking models/my-model.mlem. Could you check type: dvc exists in models/my-model.mlem? Like in this MLEM metafile?

Re 2nd question: thanks, I got it 🙏🏻

cc @mike0sv

aguschin avatar Nov 24 '22 10:11 aguschin

Is this answering your question:

$ head models/my-model.mlem
artifacts:
  data.pkl:
    hash: f47d01924e75f480b416338e313b3812
    size: 5759
    type: dvc
    uri: my-model
model_type:
  io:
    type: pickle
  methods:

Similar for the other one. So yes, there is type: dvc there.

igordertigor avatar Nov 24 '22 13:11 igordertigor

Hey @igordertigor ! Is your github repo private? If that is the case, did you provide GITHUB_TOKEN env for your production environment?

mike0sv avatar Nov 25 '22 10:11 mike0sv

I don't think that's it. The repository is private, but I'm getting the same issue locally, if I just create a new virtual env/repository and everything. However, I am locally still logged in with github and I am able to load models that are stored in git. It is only a an issue with dvc. I'm happy to try again if there is anything specific that you want me to look out for.

igordertigor avatar Nov 28 '22 15:11 igordertigor

Thanks @igordertigor. Let's try to debug this 🙌🏻 Does dvc get works and downloads the binary? Something like

$ dvc get https://github.com/repos/my-company/research models/my-model

(should download the model locally)

aguschin avatar Nov 29 '22 10:11 aguschin

@igordertigor, did you have a chance to check it?

aguschin avatar Dec 05 '22 10:12 aguschin

Hi @aguschin , apologies for the late reply. I've been away from my computer for a couple of days and will likely be here sporadically in the next couple of days as well. I'll try to at least check once per working day.

I did run dvc get and it works if I specify an ssh url:

  $ dvc get [email protected]:my-company/my-repo models/my-model

downloads models/my-model to my-model, but doesn't fetch the associated .mlem file. This doesn't work with the https url.

This made me try @mike0sv's suggestion of providing a GITHUB_TOKEN. But that doesn't seem to help. I can however use the github command line app gh to authenticate. If I select https as my preferred protocol for Git operations (which it isn't, I prefer ssh), then I can use

$ dvc get https://github.com/my-company/my-repo models/my-model

to download the model binary.

However, with those insights, mlem still doesn't work:

$cat test_load_https.py
import mlem

mlem.api.load(
    'models/my-model',
    project='https://github.com/my-company/my-repo',
)
$ python test_load_https.py
Traceback (most recent call last):
  File "load_mlem_model_through_dvc.py", line 3, in <module>
    mlem.api.load(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 99, in load
    meta = load_meta(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 167, in load_meta
    location = Location.resolve(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 103, in resolve
    return UriResolver.resolve(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 142, in resolve
    return cls.find_resolver(path, project, rev, fs).process(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 194, in process
    fs, project = cls.get_fs(project, rev)
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/meta_io.py", line 290, in get_fs
    fs, _, (path,) = get_fs_token_paths(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/fsspec/core.py", line 639, in get_fs_token_paths
    fs = cls(**options)
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/fsspec/spec.py", line 76, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/fsspec/implementations/github.py", line 52, in __init__
    r.raise_for_status()
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/my-company/my-repo

With ssh, it seems to get a little further:

$ cat test_load_ssh.py
import mlem

mlem.api.load(
    'models/my-model',
    project='[email protected]:my-company/my-model',
)
$ python test_load_ssh.py
Traceback (most recent call last):
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 206, in find_meta_location
    _, path = find_object(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/objects.py", line 1168, in find_object
    raise ValueError(
ValueError: Object models/my-model not found, search of fs <fsspec.implementations.local.LocalFileSystem object at 0x7f8a5b5be4c0> at models/feature_engine

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "load_mlem_model_through_dvc_ssh.py", line 3, in <module>
    mlem.api.load(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 99, in load
    meta = load_meta(
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 176, in load_meta
    location=find_meta_location(location),
  File "/home/ingo/tmp/dvc-mlem-issue/.venv/lib/python3.8/site-packages/mlem/core/metadata.py", line 210, in find_meta_location
    raise MlemObjectNotFound(
mlem.core.errors.MlemObjectNotFound: MLEM object was not found at `[email protected]:my-company/my-repo/models/my-model`

In fact, this error message is correct: There is no models/my-model in the repository. It's in the dvc-controlled bucket.

I hope this helps.

igordertigor avatar Dec 08 '22 19:12 igordertigor

Thanks @igordertigor! The only idea we have so far is that you somehow didn't populate GITHUB_TOKEN to be consumed by your script 😅 We'll try to reproduce this and investigate.

aguschin avatar Dec 12 '22 10:12 aguschin

Tried to reproduce. What I can say:

  1. if GITHUB_TOKEN and GITHUB_USERNAME are set, authentication with gh doesn't work. Got a different issue and created a bug https://github.com/iterative/mlem/issues/528
  2. Without gh auth login and without GITHUB_TOKEN and GITHUB_USERNAME set, running python test_load_https.py fails for me with requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/aguschin/example-mlem-private (expected)
  3. With GITHUB_TOKEN set (but without GITHUB_USERNAME set) mlem clone ... fails with ValueError: Auth required both username and token (expected)
  4. After gh auth login (using HTTPS) and without GITHUB_TOKEN / GITHUB_USERNAME, I'm getting requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/aguschin/example-mlem-private (looks like a bug to me, maybe @mike0sv can clarify).

aguschin avatar Dec 13 '22 14:12 aguschin

Thanks @aguschin , I think those are pretty much the issues I'm facing. Any idea why it doesn't work for ssh connections? After all, I can dvc get with the ssh url. And I believe the cloud storage stuff is independent of that anyway.

igordertigor avatar Dec 14 '22 08:12 igordertigor

Hi @igordertigor! Sorry for the long wait 😞 . We did some investigation for #528, and figured out one should be authorized with both SSH and HTTP. I hope this will fix your issue. I wrote it down here https://mlem-ai-dvc-private-rep-yxphuh.herokuapp.com/doc/user-guide/dvc/#working-with-private-repositories

I understand this is not the best way to ask for both SSH and HTTP, so we'll fix it, but for now this is the only solution we have in mind.

aguschin avatar Feb 15 '23 10:02 aguschin