api-inference-community
api-inference-community copied to clipboard
Insert and remove from sys path in generic pipelines
Currently in generic pipeline we simply sys.path.append the path to the snapshot repo. This is fine if running in a docker container once, but for development it can be a bit of a nightmare, especially if you're playing with multiple different repos that have implemented generic pipelines. Since we appended, you'll get previously loaded pipelines instead of the one you expect.
I suggest we do what torch.hub does, and instead sys.path.insert(0, repo_dir), import the module, and then sys.path.remove(repo_dir).
Something like:
import sys
import json
from pathlib import Path
from huggingface_hub import snapshot_download
PIPELINE_FILE = 'pipeline.py'
CONFIG_FILE = 'config.json'
# Taken directly from torch.hub
def import_module(name, path):
import importlib.util
from importlib.abc import Loader
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
assert isinstance(spec.loader, Loader)
spec.loader.exec_module(module)
return module
def load_pipeline(repo_id, **kwargs):
if Path(repo_id).is_dir():
repo_dir = Path(repo_id)
else:
repo_dir = Path(snapshot_download(repo_id))
pipeline_path = repo_dir / PIPELINE_FILE
sys.path.insert(0, repo_dir)
module = import_module(PIPELINE_FILE, pipeline_path)
sys.path.remove(repo_dir)
return module.Pipeline(repo_dir)
CC @osanseviero
cc @Narsil
IMO generic is not meant to be used very much. If it's tedious to use it's OK.
AFAIK, generic is just meant to be used as a demo purpose for some external libraries without having to fully implement the pipeline. It is not meant to be a real used path so I don't think we should do anything to optimize for it.
If anything, I would remove generic by making implementing a new pipeline more trivial than the other way around.
Messing with the path in general is asking for trouble.