loudml icon indicating copy to clipboard operation
loudml copied to clipboard

[ERROR] docker image 1.6.0 : "pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml"

Open toni-moreno opened this issue 5 years ago • 8 comments

Helo @regel

After created a model , when running it , this error appeared in the output docker log and no data in the output db has been generated. Any idea on what to do?

loudml_1   | 172.20.0.3 - - [2020-07-28 07:13:32] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002119
loudml_1   | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(linux_metrics_cpu_mean_usage_system_host_myhost_time_5m)') (last run: 2020-07-28 07:12:37, next run: 2020-07-28 07:13:37)
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 
loudml_1   | Traceback (most recent call last):
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1   |     return job_func(*args, **kwargs)
loudml_1   |   File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1   |     'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1   |     needed = self.resolve(parse_requirements(requirements))
loudml_1   |   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1   |     raise DistributionNotFound(req, requirers)
loudml_1   | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1   | 172.20.0.3 - - [2020-07-28 07:13:49] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002227
loudml_1   | INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: 2020-07-28 07:13:04, next run: 2020-07-28 07:14:04)
loudml_1   | 172.20.0.3 - - [2020-07-28 07:14:05] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002722
loudml_1   | 172.20.0.3 - - [2020-07-28 07:14:19] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002560
loudml_1   | 172.20.0.3 - - [2020-07-28 07:14:33] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.001884
loudml_1   | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(linux_metrics_cpu_mean_usage_system_host_myhost_time_5m)') (last run: 2020-07-28 07:13:38, next run: 2020-07-28 07:14:38)

This is the model info.

> version
1.6.0
> list-models
linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
> show-model linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
- settings:
    bucket_interval: 5m
    default_bucket: myhost_linux
    features:
    - default: 0
      field: usage_system
      io: io
      match_all:
      - tag: host
        value: myhost
      measurement: cpu
      metric: mean
      name: mean_usage_system
    grace_period: 0
    interval: 60s
    max_evals: 10
    max_threshold: 0
    min_threshold: 0
    name: linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
    offset: 10s
    run:
      flag_abnormal_data: true
      output_bucket: myhost_loudml
      save_output_data: true
    seasonality:
      daytime: false
      weekday: false
    span: 100
    type: donut
  training:
    job_id: fdb8d872-865d-4cdf-912a-1625a214fc54
    progress:
      eval: 10
      max_evals: 10
    state: done
> list-buckets 
myhost_linux
myhost_loudml
> show-bucket myhost_loudml
- addr: X.X.X.X:8086
  annotation_db: loudml_annotations
  create_database: false
  database: loudml_metrics
  dbuser: loudml_user
  measurement: loudml
  name: myhost_loudml
  retention_policy: autogen
  type: influxdb
  use_ssl: true
  verify_ssl: false

toni-moreno avatar Jul 28 '20 07:07 toni-moreno

Hello @regel , I've tested again in a new server with loudml:1.6.0 image and also with today loudml:nightly image, in both the error persist

As a help, I've found a 'bypass' (while no need to change image) by installing some basic python packages as root direct inside the image

$ docker exec -it -u 0 7e011d7c0881  bash
root@7e011d7c0881:/#  apt-get update && apt-get install -y python3-pip python3-setuptools python3-dev && apt-get install -y --no-install-recommends build-essential gcc git && apt-get purge -y

no restart needed!!! , suddenly the error log has disappeared and loudml began to write the the output database.

right now

image

toni-moreno avatar Jul 30 '20 17:07 toni-moreno

Oops. Very good catch. Thanks Toni. Something is odd in the build. I'm patching the Dockerfile.

regel avatar Aug 04 '20 07:08 regel

Solved. Toni, see the above patches and new Dockerfile in develop branch if you need to build a local image.

I will tag a new 1.6 release e/o the month.

regel avatar Aug 04 '20 19:08 regel

Hello @regel , thanks a lot for this fix. I've build a new image and pushed here if you want to test it. tonimoreno/loudml:1.6.0

but when restarted the service with the new image this error appeared. Can you help me to understand what I did wrong?

Attaching to loudml-poc_loudml_1
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
loudml_1    | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1    |   np_resource = np.dtype([("resource", np.ubyte, 1)])
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@95percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@10m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@1m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@30m'
loudml_1    | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1    | INFO:root:starting Loud ML server on 0.0.0.0:8077
loudml_1    | 192.168.48.3 - - [2020-08-05 05:17:55] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.001249
loudml_1    | 192.168.48.3 - - [2020-08-05 05:18:10] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000694
loudml_1    | 192.168.48.3 - - [2020-08-05 05:18:25] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000983
loudml_1    | 192.168.48.3 - - [2020-08-05 05:18:40] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000804
loudml_1    | INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1    | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)') (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1    | INFO:root:job[0be19343-c409-4db5-af7f-540f4475efee] starting, nice=0
loudml_1    | INFO:root:predict(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T05:15:00.000Z-2020-08-05T05:20:00.000Z
loudml_1    | XXX lineno: 115, opcode: 0
loudml_1    | ERROR:root:unknown opcode
loudml_1    | Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1    |     res = getattr(self, func_name)(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1    |     **kwargs
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1    |     num_gpus=num_gpus,
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1    |     self.load(num_cpus, num_gpus)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1    |     self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1    |     keras_model = load_model(path, compile=False)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1    |     model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1    |     return deserialize(config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1    |     printable_module_name='layer')
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1    |     list(custom_objects.items())))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1    |     process_node(layer, node_data)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1    |     layer(input_tensors, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1    |     outputs = self.call(inputs, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1    |     return self.function(inputs, **arguments)
loudml_1    |   File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1    |     z_mean, z_log_var = args
loudml_1    | SystemError: unknown opcode
loudml_1    | ERROR:root:job[0be19343-c409-4db5-af7f-540f4475efee] failed: unknown opcode
loudml_1    | [2020-08-05 05:18:54,323] ERROR in app: Exception on /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval [POST]
loudml_1    | pebble.common.RemoteTraceback: Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/pebble/common.py", line 174, in process_execute
loudml_1    |     return function(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 351, in run
loudml_1    |     return g_worker.run(job_id, nice, func_name, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 58, in run
loudml_1    |     raise exn
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1    |     res = getattr(self, func_name)(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1    |     **kwargs
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1    |     num_gpus=num_gpus,
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1    |     self.load(num_cpus, num_gpus)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1    |     self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1    |     keras_model = load_model(path, compile=False)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1    |     model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1    |     return deserialize(config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1    |     printable_module_name='layer')
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1    |     list(custom_objects.items())))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1    |     process_node(layer, node_data)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1    |     layer(input_tensors, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1    |     outputs = self.call(inputs, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1    |     return self.function(inputs, **arguments)
loudml_1    |   File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1    |     z_mean, z_log_var = args
loudml_1    | SystemError: unknown opcode
loudml_1    | 
loudml_1    | 
loudml_1    | The above exception was the direct cause of the following exception:
loudml_1    | 
loudml_1    | Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
loudml_1    |     response = self.full_dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
loudml_1    |     rv = self.handle_user_exception(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
loudml_1    |     return original_handler(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
loudml_1    |     reraise(exc_type, exc_value, tb)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
loudml_1    |     raise value
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
loudml_1    |     rv = self.dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
loudml_1    |     return self.view_functions[rule.endpoint](**req.view_args)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 1602, in model_eval
loudml_1    |     return jsonify(job.result())
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 393, in result
loudml_1    |     return self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 372, in _done_cb
loudml_1    |     self._result = self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    | SystemError: unknown opcode
loudml_1    | ERROR:loudml.server:Exception on /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval [POST]
loudml_1    | pebble.common.RemoteTraceback: Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/pebble/common.py", line 174, in process_execute
loudml_1    |     return function(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 351, in run
loudml_1    |     return g_worker.run(job_id, nice, func_name, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 58, in run
loudml_1    |     raise exn
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1    |     res = getattr(self, func_name)(*args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1    |     **kwargs
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1    |     num_gpus=num_gpus,
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1    |     self.load(num_cpus, num_gpus)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1    |     self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1    |     keras_model = load_model(path, compile=False)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1    |     model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1    |     return deserialize(config, custom_objects=custom_objects)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1    |     printable_module_name='layer')
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1    |     list(custom_objects.items())))
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1    |     process_node(layer, node_data)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1    |     layer(input_tensors, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1    |     outputs = self.call(inputs, *args, **kwargs)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1    |     return self.function(inputs, **arguments)
loudml_1    |   File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1    |     z_mean, z_log_var = args
loudml_1    | SystemError: unknown opcode
loudml_1    | 
loudml_1    | 
loudml_1    | The above exception was the direct cause of the following exception:
loudml_1    | 
loudml_1    | Traceback (most recent call last):
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
loudml_1    |     response = self.full_dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
loudml_1    |     rv = self.handle_user_exception(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
loudml_1    |     return original_handler(e)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
loudml_1    |     reraise(exc_type, exc_value, tb)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
loudml_1    |     raise value
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
loudml_1    |     rv = self.dispatch_request()
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
loudml_1    |     return self.view_functions[rule.endpoint](**req.view_args)
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 1602, in model_eval
loudml_1    |     return jsonify(job.result())
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 393, in result
loudml_1    |     return self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    |   File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 372, in _done_cb
loudml_1    |     self._result = self._future.result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
loudml_1    |     return self.__get_result()
loudml_1    |   File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1    |     raise self._exception
loudml_1    | SystemError: unknown opcode
loudml_1    | 127.0.0.1 - - [2020-08-05 05:18:54] "POST /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval?output_bucket=test-loudml&flag_abnormal_data=True&save_output_data=True&from=1596604664&to=1596604724 HTTP/1.1" 500 156 0.260177
loudml_1    | ERROR:root:error executing scheduled job '_eval(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)':INTERNAL SERVER ERROR
loudml_1    | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)') (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1    | INFO:root:job[4cb5ec63-e3f0-475d-8075-bbdc3bb38264] starting, nice=0
loudml_1    | INFO:root:predict(swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T05:15:00.000Z-2020-08-05T05:20:00.000Z
loudml_1    | XXX lineno: 115, opcode: 0
loudml_1    | ERROR:root:unknown opcode

toni-moreno avatar Aug 05 '20 05:08 toni-moreno

Hi Toni. Interesting finding. I upgraded the Python version to 3.7. The Python serialisation format is probably different in this version causing ‘load_model’ to fail.

What if you delete model state and re-train the model? Solves the issue?

regel avatar Aug 05 '20 06:08 regel

Same here!

Using @Toni fix, seems to works 👍

$ docker exec -it -u 0 7e011d7c0881 bash root@7e011d7c0881:/# apt-get update && apt-get install -y python3-pip python3-setuptools python3-dev && apt-get install -y --no-install-recommends build-essential gcc git && apt-get purge -y

ezar avatar Mar 08 '21 13:03 ezar

Hi- I was wondering if this ever got resolved and included in the final release? If I use "FROM loudml/loudml:1.6.0 in my dockerfile I still get this error.

wjlove avatar Apr 21 '22 15:04 wjlove

I had to create my own docker image like this:

Dockerfile

FROM loudml/loudml:latest

# SHELL ["/bin/bash", "-o", "pipefail", "-c"]
USER 0

# https://github.com/regel/loudml/issues/370
RUN apt-get update && \
    apt-get install -y \
    python3-pip python3-setuptools \
    python3-dev && \ 
    apt-get install -y --no-install-recommends \
    build-essential gcc git &&\
    apt-get purge -y


ENTRYPOINT ["loudmld"]

Note USER 0 is needed because for some reason base image uses uid 1001 that doesn't have permission to install deps

Then in docker-compose:

# image: loudml/loudml:1.6.0
    build: .
    container_name: loudml

robertsLando avatar Jun 16 '22 07:06 robertsLando