metaflow-service
metaflow-service copied to clipboard
`metaflow.exception.MetaflowNotFound` while the Flow exists on the S3 server (on-prem)
Problem
I have deployed a metaflow service with the dev
docker-compose
and I extended the the environment variables with the ones which is needed to configure metaflow
. As I saw that is used to to retrieve artefacts for the UI for the different runs.
Then it seems like metaflow can't access the files on the S3 storage (metaflow.exception.MetaflowNotFound
error).
Details
(If I don't include the METAFLOW_...
envs then I receive and "AWS credential error ..." error which seems valid, as I have my own S3 endpoint.)
These variables are new compared to the existing setup:
- AWS_ACCESS_KEY_ID=<ID>
- AWS_SECRET_ACCESS_KEY=<SECRET>
- METAFLOW_DEFAULT_METADATA="service"
- METAFLOW_DEFAULT_DATASTORE="s3"
- METAFLOW_DATASTORE_SYSROOT_S3="s3://testbucket/metaflow-testbucket"
- METAFLOW_S3_ENDPOINT_URL="http://192.168.99.99"
- METAFLOW_S3_VERIFY_CERTIFICATE=false
Now, when I am running with these, the error is the following
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 302, in <module>
cli(auto_envvar_prefix='MFCACHE')
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 298, in cli
Scheduler(store, max_actions).loop()
File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 199, in __init__
maxtasksperchild=512, # Recycle each worker once 512 tasks have been completed
File "/usr/local/lib/python3.7/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
self._repopulate_pool()
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
w.start()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/usr/local/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
code = process_obj._bootstrap()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action
execute(tempdir, action_cls, request)
File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 56, in execute
invalidate_cache=req.get('invalidate_cache', False))
File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 140, in execute
results = {**existing_keys}
File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/root/services/ui_backend_service/data/cache/utils.py", line 130, in streamed_errors
get_traceback_str()
File "/root/services/ui_backend_service/data/cache/utils.py", line 124, in streamed_errors
yield
File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 131, in execute
task = Task(pathspec, attempt=attempt)
File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 947, in __init__
super(Task, self).__init__(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 361, in __init__
self._object = self._get_object(*ids)
File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 391, in _get_object
raise MetaflowNotFound("%s does not exist" % self)
metaflow.exception.MetaflowNotFound: Task('HelloFlow/5/start/12', attempt=0) does not exist
If I check my S3 with s3cmd
I can see that a dir exists with this path.
When I am running the flow the files are stored perfectly, I did not notice any problems, and also I can see it on the UI.
(I understand that Metaflow
is not primary created for on-prem usage, but it would be a blast to use it without AWS. I would be grateful for a on-prem setup guide)