docker-vernemq
docker-vernemq copied to clipboard
Internal database corrupted after first start of Docker cluster
Environment
- VerneMQ Version: 1.19.1
- OS: Docker Alpine
- Erlang/OTP version (if building from source):
- VerneMQ configuration (vernemq.conf) or the changes from the default
- Cluster size/standalone: 3
Used following directives in docker-compose.yaml
for clustering:
mqtt-adapter-1:
image: mainflux/mqtt-verne:latest
container_name: mainflux-mqtt-1
depends_on:
- things
- nats
- es-redis
restart: on-failure
environment:
MF_MQTT_ADAPTER_LOG_LEVEL: ${MF_MQTT_ADAPTER_LOG_LEVEL}
MF_MQTT_INSTANCE_ID: mqtt-adapter-1
MF_MQTT_ADAPTER_WS_PORT: ${MF_MQTT_ADAPTER_WS_PORT}
MF_MQTT_ADAPTER_ES_URL: tcp://es-redis:${MF_REDIS_TCP_PORT}
MF_NATS_URL: ${MF_NATS_URL}
MF_THINGS_AUTH_GRPC_URL: http://things:${MF_THINGS_AUTH_GRPC_PORT}
DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD: "off"
DOCKER_VERNEMQ_PLUGINS__VMQ_ACL: "off"
DOCKER_VERNEMQ_PLUGINS__MFX_AUTH: "on"
DOCKER_VERNEMQ_PLUGINS__MFX_AUTH__PATH: /mainflux/_build/default
DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL: debug
MF_MQTT_VERNEMQ_GRPC_POOL_SIZE: 1000
ports:
- 18831:1883
- 8881:8080
- 7777:8888 # VerneMQ dasboard
networks:
- mainflux-base-net
mqtt-adapter-2:
image: mainflux/mqtt-verne:latest
container_name: mainflux-mqtt-2
depends_on:
- things
- nats
- es-redis
restart: on-failure
environment:
MF_MQTT_ADAPTER_LOG_LEVEL: ${MF_MQTT_ADAPTER_LOG_LEVEL}
MF_MQTT_INSTANCE_ID: mqtt-adapter-2
MF_MQTT_ADAPTER_WS_PORT: 8080
MF_MQTT_ADAPTER_ES_URL: tcp://es-redis:${MF_REDIS_TCP_PORT}
MF_NATS_URL: ${MF_NATS_URL}
MF_THINGS_AUTH_GRPC_URL: http://things:${MF_THINGS_AUTH_GRPC_PORT}
DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD: "off"
DOCKER_VERNEMQ_PLUGINS__VMQ_ACL: "off"
DOCKER_VERNEMQ_PLUGINS__MFX_AUTH: "on"
DOCKER_VERNEMQ_PLUGINS__MFX_AUTH__PATH: /mainflux/_build/default
DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL: debug
MF_MQTT_VERNEMQ_GRPC_POOL_SIZE: 1000
DOCKER_VERNEMQ_COMPOSE: 1
DOCKER_VERNEMQ_DISCOVERY_NODE: mqtt-adapter-1
ports:
- 18832:1883
- 8882:8080
- 7778:8888 # VerneMQ dasboard
depends_on:
- mqtt-adapter-1
networks:
- mainflux-base-net
mqtt-adapter-3:
image: mainflux/mqtt-verne:latest
container_name: mainflux-mqtt-3
depends_on:
- things
- nats
- es-redis
restart: on-failure
environment:
MF_MQTT_ADAPTER_LOG_LEVEL: ${MF_MQTT_ADAPTER_LOG_LEVEL}
MF_MQTT_INSTANCE_ID: mqtt-adapter-3
MF_MQTT_ADAPTER_PORT: 18833
MF_MQTT_ADAPTER_WS_PORT: 8882
MF_MQTT_ADAPTER_ES_URL: tcp://es-redis:${MF_REDIS_TCP_PORT}
MF_NATS_URL: ${MF_NATS_URL}
MF_THINGS_AUTH_GRPC_URL: http://things:${MF_THINGS_AUTH_GRPC_PORT}
DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD: "off"
DOCKER_VERNEMQ_PLUGINS__VMQ_ACL: "off"
DOCKER_VERNEMQ_PLUGINS__MFX_AUTH: "on"
DOCKER_VERNEMQ_PLUGINS__MFX_AUTH__PATH: /mainflux/_build/default
DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL: debug
MF_MQTT_VERNEMQ_GRPC_POOL_SIZE: 1000
DOCKER_VERNEMQ_COMPOSE: 1
DOCKER_VERNEMQ_DISCOVERY_NODE: mqtt-adapter-1
ports:
- 18833:1883
- 8883:8080
- 7779:8888 # VerneMQ dasboard
depends_on:
- mqtt-adapter-1
networks:
- mainflux-base-net
Expected behavior
Cluster to start normally
Actual behaviour
On restarting docker-compose, master node in the cluster fails:
mainflux-mqtt-1 | 18:47:50.568 [info] Datadir ./data/meta/meta/10 options for LevelDB: [{open,[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{compression,true},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,6},{use_bloomfilter,true},{write_buffer_size,47182363}]},{read,[{verify_checksums,true}]},{write,[{sync,false}]},{fold,[{verify_checksums,true},{fill_cache,false}]}]
mainflux-mqtt-1 | 18:47:50.592 [info] Datadir ./data/meta/meta/11 options for LevelDB: [{open,[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{compression,true},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,6},{use_bloomfilter,true},{write_buffer_size,48522751}]},{read,[{verify_checksums,true}]},{write,[{sync,false}]},{fold,[{verify_checksums,true},{fill_cache,false}]}]
mainflux-mqtt-1 | 18:47:50.619 [error] Supervisor plumtree_sup had child plumtree_broadcast started with plumtree_broadcast:start_link() at undefined exit with reason {'EXIT',{function_clause,[{orddict,fetch,['[email protected]',[{'[email protected]',['[email protected]']},{'[email protected]',['[email protected]']},{'[email protected]',['[email protected]']}]],[{file,"orddict.erl"},{line,80}]},{plumtree_broadcast,init_peers,1,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,754}]},{plumtree_broadcast,start_link,0,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,150}]},...]}} in context start_error
mainflux-mqtt-1 | 18:47:50.621 [error] CRASH REPORT Process <0.193.0> with 0 neighbours exited with reason: {{error,{shutdown,{failed_to_start_child,plumtree_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['[email protected]',[{'[email protected]',['[email protected]']},{'[email protected]',['[email protected]']},{'[email protected]',['[email protected]']}]],[{file,"orddict.erl"},{line,80}]},{plumtree_broadcast,init_peers,1,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,754}]},{plumtree_broadcast,start_link,0,[{file,"/vernemq-build/_build/..."},...]},...]}}}}},...} in application_master:init/4 line 138
mainflux-mqtt-1 | 18:47:50.621 [info] Application plumtree exited with reason: {{error,{shutdown,{failed_to_start_child,plumtree_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['[email protected]',[{'[email protected]',['[email protected]']},{'[email protected]',['[email protected]']},{'[email protected]',['[email protected]']}]],[{file,"orddict.erl"},{line,80}]},{plumtree_broadcast,init_peers,1,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,754}]},{plumtree_broadcast,start_link,0,[{file,"/vernemq-build/_build/..."},...]},...]}}}}},...}
mainflux-mqtt-1 | 18:47:50.621 [debug] loading modules: [vmq_plumtree,vmq_plumtree_app,vmq_plumtree_sup]
mainflux-mqtt-1 | 18:47:50.621 [info] Application sext exited with reason: stopped
mainflux-mqtt-1 | 18:47:50.621 [info] Application riak_dt exited with reason: stopped
mainflux-mqtt-1 | 18:47:50.627 [debug] Lager installed handler lager_backend_throttle into lager_event
mainflux-mqtt-1 | 18:47:50.633 [info] Try to start vmq_plumtree: ok
mainflux-mqtt-1 | [os_mon] memory supervisor port (memsup): Erlang has closed
mainflux-mqtt-1 | 18:47:50.637 [error] CRASH REPORT Process <0.215.0> with 0 neighbours crashed with reason: bad argument in call to ets:lookup(cluster_state, cluster_state) in plumtree_peer_service_manager:get_local_state/0 line 43
mainflux-mqtt-1 | [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
mainflux-mqtt-1 | 18:47:50.637 [error] CRASH REPORT Process <0.188.0> with 0 neighbours exited with reason: bad argument in call to ets:lookup(cluster_state, cluster_state) in plumtree_peer_service_manager:get_local_state/0 line 43 in application_master:init/4 line 138
mainflux-mqtt-1 | 18:47:50.637 [info] Application vmq_server exited with reason: bad argument in call to ets:lookup(cluster_state, cluster_state) in plumtree_peer_service_manager:get_local_state/0 line 43
mainflux-mqtt-1 | 18:47:50.640 [info] alarm_handler: {clear,system_memory_high_watermark}
Master node container must be deleted in order for composition to work again:
docker rm mainflux-mqtt-1
Transferred this to the vernemq-docker
repo. Looking at the stack trace my guess is that the node comes with a new node name but with the old metadata - so perhaps this is an issue with the docker-compose 'statefulness'? Note, I have never worked with docker-compose, so I have no idea how it works.
Though likely not a solution to the root cause of this, perhaps you should try the swc
metadata backend - the plumtree one will be deprecated and removed for VerneMQ 2.0.