ECS Storage Pool Nodes Never Initialize
Expected Behavior
The storage pool creates as expected, and I can proceed to building virtual data centers etc.
Actual Behavior
(Please put additional output and logs in the section for that below) The "create storage pool" process never gets past the "Initializing" phase.
I had read that it may take a very long time to complete, so after multiple attempts at debugging; I started the process before a two week break, when I returned the storage pool nodes were still in the initializing state. This suggests that it is not simply taking a long time, rather the process is in fact never going to proceed.
I had also read that an issue with the ECS community edition GUI causes false reporting of state, and that the storage pool may actually be created, however attempting to create a VDC did not work, ruling out this explanation. Ample time was given after step one to allow ECS to properly come online, before proceeding with step 2.
To summarize the issue; I try to create a storage pool either using the Step 2 script, or manually in the GUI, and ECS seems to never complete the process, blocking me from proceeding with using the appliance.
Some additional info: The ECS service is, apparently, online and I can log in via the GUI and perform some operations, such as changing passwords or accepting licenses. Any advice as to what can cause this to happen would be much appreciated. I have attached some output containing relevant keywords from several logs, but I'm unsure which specific VIPR logs are relevant, I will attach on request.
Steps to Reproduce Behavior
- Follow instructions and run Step1 script to install ECS
- Either run step 2, or manually create a storage pool
- Observe the state of the storage pool nodes to see if they become stuck initializing
Relevant Output and Logs
# Output and Logs go here
messages:2019-12-19T15:48:52.110016+00:00 luna kernel: [ 1.233396] audit: initializing netlink socket (disabled)
messages:2019-12-19T16:01:45.375359+00:00 luna rm[13482]: [ws_native] BPlusTreeCache.cpp:249 initializing cache with size: 128
ecsportalsvc.out:2019-12-19 15:51:05,462 main TRACE TypeConverterRegistry initializing.
coordinatorsvc.log:2019-12-19T16:23:42,798 [pool-4-thread-5] INFO ActivityLog.java (line 128) Cleanup Task: 2019-12-19 16:23:42,797 main TRACE TypeConverterRegistry initializing.
ecsportalsvc.log:<?xml version="1.0" encoding="UTF-8" standalone="yes"?><commodity_data_stores><commodity_data_store><creation_time>1576771255000</creation_time><link rel="self" href="/vdc/object-pools/5e397e48-2275-11ea-ba86-005056bc73b7"/><name>10.10.2.62</name><free_gb>0</free_gb><description>My First SP</description><device_info></device_info><device_state>initializin</device_state><usable_gb>0</usable_gb><used_gb>0</used_gb><id>5e397e48-2275-11ea-ba86-005056bc73b7</id><varray>urn:storageos:VirtualArray:a0b5f083-4439-4c37-b224-260873e02cfe</varray></commodity_data_store><commodity_data_store><creation_time>1576771298000</creation_time><link rel="self" href="/vdc/object-pools/5e9e0584-2275-11ea-9f99-005056bc26a2"/><name>10.10.2.64</name><free_gb>0</free_gb><description>My First SP</description><device_info></device_info><device_state>initializing</device_state><usable_gb>0</usable_gb><used_gb>0</used_gb><id>5e9e0584-2275-11ea-9f99-005056bc26a2</id><varray>urn:storageos:VirtualArray:a0b5f083-4439-4c37-b224-260873e02cfe</varray></commodity_data_store><commodity_data_store><creation_time>1576771297000</creation_time><link rel="self" href="/vdc/object-pools/5e5cdef6-2275-11ea-947a-005056bc736b"/><name>10.10.2.63</name><free_gb>0</free_gb><description>My First SP</description><device_info></device_info><device_state>initializing</device_state><usable_gb>0</usable_gb><used_gb>0</used_gb><id>5e5cdef6-2275-11ea-947a-005056bc736b</id><varray>urn:storageos:VirtualArray:a0b5f083-4439-4c37-b224-260873e02cfe</varray></commodity_data_store></commodity_data_stores>
stat.log:<?xml version="1.0" encoding="UTF-8" standalone="yes"?><commodity_data_stores><commodity_data_store><creation_time>1576771255000</creation_time><link rel="self" href="/vdc/object-pools/5e397e48-2275-11ea-ba86-005056bc73b7"/><name>10.10.2.62</name><free_gb>0</free_gb><description>My First SP</description><device_info></device_info><device_state>initializing</device_state><usable_gb>0</usable_gb><used_gb>0</used_gb><id>5e397e48-2275-11ea-ba86-005056bc73b7</id><varray>urn:storageos:VirtualArray:a0b5f083-4439-4c37-b224-260873e02cfe</varray></commodity_data_store><commodity_data_store><creation_time>1576771298000</creation_time><link rel="self" href="/vdc/object-pools/5e9e0584-2275-11ea-9f99-005056bc26a2"/><name>10.10.2.64</name><free_gb>0</free_gb><description>My First SP</description><device_info></device_info><device_state>initializing</device_state><usable_gb>0</usable_gb><used_gb>0</used_gb><id>5e9e0584-2275-11ea-9f99-005056bc26a2</id><varray>urn:storageos:VirtualArray:a0b5f083-4439-4c37-b224-260873e02cfe</varray></commodity_data_store><commodity_data_store><creation_time>1576771297000</creation_time><link rel="self" href="/vdc/object-pools/5e5cdef6-2275-11ea-947a-005056bc736b"/><name>10.10.2.63</name><free_gb>0</free_gb><description>My First SP</description><device_info></device_info><device_state>initializing</device_state><usable_gb>0</usable_gb><used_gb>0</used_gb><id>5e5cdef6-2275-11ea-947a-005056bc736b</id><varray>urn:storageos:VirtualArray:a0b5f083-4439-4c37-b224-260873e02cfe</varray></commodity_data_store></commodity_data_stores>
transformsvc.log:2019-12-19T16:00:33,232 [main] INFO SSConnMgr initializing - poolConnections:true numConnectionsPerAddress:10 maxWaitMillis:5000 enable connMonitor:true connectTimeoutMillis:1000 highWaterWriteKb:128 lowWaterWriteKb:32 tcpNoDelay:true so_RcvBuf:0 so_SndBuf:0 eventLoopGroupThreads:50 idleTimeoutSeconds:19
transformsvc.log:2019-12-19T16:00:33,336 [main] INFO TCPServer TCPServer initializing, includeCiphers =[TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, ]
transformsvc.log:2019-12-19T16:00:33,336 [main] INFO TCPServer TCPServer initializing, excludeCiphers =[]
transformsvc.log:2019-12-19T16:00:33,336 [main] INFO TCPServer TCPServer initializing, protocolsToExclude =[TLSv1.1, TLSv1, SSLv3, SSLv2, SSLv1, ]
transformsvc.log:2019-12-20T16:00:41,068 [main] INFO SSConnMgr initializing - poolConnections:true numConnectionsPerAddress:10 maxWaitMillis:5000 enable connMonitor:true connectTimeoutMillis:1000 highWaterWriteKb:128 lowWaterWriteKb:32 tcpNoDelay:true so_RcvBuf:0 so_SndBuf:0 eventLoopGroupThreads:50 idleTimeoutSeconds:19
transformsvc.log:2019-12-20T16:00:41,196 [main] INFO TCPServer TCPServer initializing, includeCiphers =[TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, ]
ssm-btree-dump.log:2020-01-10T00:25:39,278 [DTInit-urn:storageos:OwnershipInfo:a0b5f083-4439-4c37-b224-260873e02cfe__SS_31_32_1:-000] INFO BPlusTree.java (line 1069) Skip initializing page table for dtId urn:storageos:OwnershipInfo:a0b5f083-4439-4c37-b224-260873e02cfe__SS_31_32_1: zone urn:storageos:VirtualDataCenterData:7e4a0fc9-4831-4c45-8fad-54a8a2fe8ae6, featureEnabled false tree Record null
Notifies: @captntuttle
Linking related issue: https://github.com/EMCECS/ECS-CommunityEdition/issues/187
Update: After several failed attempts at debugging, I reverted to the old (version 1) installer using the Python scripts. I ran this V1 installer on the exact same machines that the current bug is occurring on, and strangely the install succeeds and I am able to create Storage Pools, VDC, buckets, etc. This suggests that my machine is configured correctly, but for some reason cannot interact with the new ECS version properly.
Hi, AB442 !
I have the same error. What kind of "old pythot script version 1"? Tell me more about it?
Hi, ledbsd,
So what I'm referring to is the legacy installer, I used this on a project about 2 years ago and it has since been deprecated in favor of the new Ansible installer. Here's a link to the legacy installer final release:
https://github.com/EMCECS/ECS-CommunityEdition/releases/tag/3.0.0.1-legacy-final
It sounds like your environment is configured similarly to mine so you should just need to run the relevant step1 & step2 scripts. In the Documentation folder, the ECS-MultiNode-Instructions.md or ECS-SingleNode-Instructions.md files will have instructions for this. FYI: Make sure you have the correct ports open when you run it.
Just to reiterate, this legacy installer will install an old version of ECS, you would need to investigate the change logs to determine the difference in features.
Let me know if you debug / discover information on the Storage Pool issue.