optimum Optimum `from_pretrained` does not seem to honour TRANSFORMERS

We currently use the TRANSFORMERS_CACHE library to ensure that the downloads are set on a folder with relevant write permissions, as we are deploying inside a container in kubernetes - it seems that the only way of setting this would be through an explicit parameter to the model (as per the code below). This results in quite a fiddly implementation to cover cases for both optimum-pipeline and non-optimum pipeline. This issue woudl encompass supporting the TRANSFORMERS_CACHE env var that is used in the non-optimum transformer classes.

May 13 '22 07:05 axsaucedo

Following up on this, it seems that setting the cache_dir as follows does not seem to do the full job:

        model = optimum_class.from_pretrained(
            hf_settings.pretrained_model,
            from_transformers=True,
            cache_dir=TRANSFORMER_CACHE_DIR,
        )
        tokenizer = AutoTokenizer.from_pretrained(tokenizer, cache_dir=TRANSFORMER_CACHE_DIR)

This still seems to result with some contents ignoring the cache_dir, and trying to store it in the default ~/.cache dir - these are some errors:

...etc
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.cache/huggingface'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mlserver", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/mlserver/cli/main.py", line 76, in main
    root()
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/mlserver/cli/main.py", line 19, in wrapper
    return asyncio.run(f(*args, **kwargs))
  File "/usr/local/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.8/site-packages/mlserver/cli/main.py", line 43, in start
    await server.start(models_settings)
  File "/usr/local/lib/python3.8/site-packages/mlserver/server.py", line 77, in start
    await asyncio.gather(
  File "/usr/local/lib/python3.8/site-packages/mlserver/registry.py", line 269, in load
    return await self._models[model_settings.name].load(model_settings)
  File "/usr/local/lib/python3.8/site-packages/mlserver/registry.py", line 143, in load
    await self._load_model(new_model)
  File "/usr/local/lib/python3.8/site-packages/mlserver/registry.py", line 151, in _load_model
    await model.load()
  File "/usr/local/lib/python3.8/site-packages/mlserver_huggingface/runtime.py", line 64, in load
    await asyncio.get_running_loop().run_in_executor(
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/site-packages/mlserver_huggingface/common.py", line 107, in load_pipeline_from_settings
    model = optimum_class.from_pretrained(
  File "/usr/local/lib/python3.8/site-packages/optimum/modeling_base.py", line 201, in from_pretrained
    return cls._from_transformers(
  File "/usr/local/lib/python3.8/site-packages/optimum/onnxruntime/modeling_ort.py", line 222, in _from_transformers
    save_dir.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/.cache'

May 13 '22 12:05 axsaucedo

After looking deeper into the various projects source code I was able to find that there are some resources that may also use that .cache folder for storage, and that can be overridden with the XDG_CACHE_HOME variable. This seems to do the job, so we'll be able to unblock this by setting this env variable instead.

May 13 '22 14:05 axsaucedo

Hey @axsaucedo,

Yes, we are aware that currently there is no cache enabled. [REF]. Before enabling we need to make a few more changes, regarding seq2seq models and models > 2GB (multifile)

May 13 '22 14:05 philschmid

Ok interesting @philschmid - the main issue we were facing was that it runs in a container with restricted access so it seems it was ignoring the TRANSFORMERS_CACHE env var, however as mentioned above we have been able to circuvment with the XDG_CACHE_HOME env var for now which seems to do the job - thanks fror the reply and context

May 13 '22 14:05 axsaucedo

Optimum `from_pretrained` does not seem to honour TRANSFORMERS_CACHE env var