salt-ext-modules-vmware icon indicating copy to clipboard operation
salt-ext-modules-vmware copied to clipboard

esxi salt-proxy fails with errors (__salt__ is not defined) causes failure

Open dfidler opened this issue 3 years ago • 3 comments

I've setup an esxi proxy using the saltext modules and it's getting constant errors per the following:

# salt-proxy --proxyid esxi_cluster1_esx01
[CRITICAL] Failed to load grains defined in grain file esxi.esxi in function <LoadedFunc name='esxi.esxi'>, error:
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/salt/loader/__init__.py", line 943, in grains
    ret = funcs[key](**kwargs)
  File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 149, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1201, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1216, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/saltext/vmware/grains/esxi.py", line 37, in esxi
    return _grains()
  File "/usr/lib/python3.7/site-packages/saltext/vmware/grains/esxi.py", line 95, in _grains
    username, password = _find_credentials(host)
  File "/usr/lib/python3.7/site-packages/saltext/vmware/grains/esxi.py", line 75, in _find_credentials
    ret = __salt__["vmware_info.system_info"](
NameError: name '__salt__' is not defined

In looking at other grains modules I can see them using salt but I'm guessing that the salt-proxy isn't LazyLoading modules before the grains module (just a guess) executes. If I remove the whole grains module directory

mv /usr/lib/python3.7/site-packages/saltext/vmware/grains /root

I am getting verify_ssl errors (not surprising as the hosts are using self-signed certs, which I think i just need to set the credential_store param in the esxi proxy config or set verify_ssl somewhere.

Either way, once I remove the grains module I get further. That makes a degree of sense because the esxi proxy minion offers a grains method natively, which this grains module seems to be replicating.

dfidler avatar Apr 27 '22 16:04 dfidler

I'm hitting this as well. I tried @dfidler's grains workaround, but no matter what I do with credstore, verify_ssl, or even loading a cert, I still receive the SSL errors.

ggiesen avatar Apr 28 '22 17:04 ggiesen

Okay, I had a couple of hours to have a look at this and it looks like there are a few things happening here, both with the sddc modules and the lazy loader.

Long Version

You can get the proxy going with the following:

Change to your site-packages directory (mine is /usr/lib/python3.7/site-packages);

BASE=/usr/lib/python3.7/site-packages

Change the esxi proxy

x=saltext/vmware/proxy/esxi.py && diff -u --strip $x-orig $x
--- saltext/vmware/proxy/esxi.py-orig   2022-04-29 12:29:54.857270014 +0000
+++ saltext/vmware/proxy/esxi.py        2022-04-29 12:30:21.169169678 +0000
@@ -451,7 +451,7 @@
         if DETAILS["mechanism"] == "userpass":
             find_credentials(DETAILS["host"])
             try:
-                __salt__["vmware_info.system_info"](
+                __salt__["vsphere.system_info"](
                     host=DETAILS["host"],
                     username=DETAILS["username"],
                     password=DETAILS["password"],
@@ -520,7 +520,7 @@
         for password in passwords:
             try:
                 # Try to authenticate with the given user/password combination
-                ret = __salt__["vmware_info.system_info"](
+                ret = __salt__["vsphere.system_info"](
                     host=host, username=user, password=password, verify_ssl=verify_ssl
                 )
             except SaltSystemExit:
@@ -542,7 +542,7 @@
     username, password = find_credentials(DETAILS["host"])
     verify_ssl = DETAILS["verify_ssl"]

-    ret = __salt__["vmware_info.system_info"](
+    ret = __salt__["vsphere.system_info"](
         host=host,
         username=username,
         password=password,

Change the esxi grains module

# x=saltext/vmware/grains/esxi.py && diff -u --strip $x-orig $x
--- saltext/vmware/grains/esxi.py-orig  2022-04-29 12:27:07.317907565 +0000
+++ saltext/vmware/grains/esxi.py       2022-04-29 12:28:32.657583133 +0000
@@ -9,6 +9,7 @@

 import salt.utils.proxy
 from salt.exceptions import SaltSystemExit
+import salt.modules.vsphere

 __proxyenabled__ = ["esxi"]
 __virtualname__ = "esxi"
@@ -72,7 +73,7 @@
         for password in passwords:
             try:
                 # Try to authenticate with the given user/password combination
-                ret = __salt__["vmware_info.system_info"](
+                ret = salt.modules.vsphere.system_info(
                     host=host, username=user, password=password, verify_ssl=verify_ssl
                 )
             except SaltSystemExit:
@@ -96,7 +97,7 @@
             protocol = __pillar__["proxy"].get("protocol")
             port = __pillar__["proxy"].get("port")
             verify_ssl = __pillar__["proxy"].get("verify_ssl")
-            ret = __salt__["vmware_info.system_info"](
+            ret = salt.modules.vsphere.system_info(
                 host=host,
                 username=username,
                 password=password,

Once you've done that you can get your proxy running.

# salt esxi_cluster1_esx01 esxi.cmd system_info
esxi_cluster1_esx01:
    'esxi.cmd' is not available.
ERROR: Minions returned with non-zero exit code

From what I can tell, this is because saltext/vmware/modules/esxi.py is a direct conflict with the legacy module (sat/modules/esxi.py) [duplicate file name of the same module type]. I think the loader is loading the modules by name and because LazyLoader is basically just a dict, it's only trying to load the new saltext module (because that's what takes precedence).

The legacy module declares itself as proxyenabled for the esxi proxy type, and returns virtualname if the minion is a proxy.

# grep proxyenabled salt/modules/esxi.py
__proxyenabled__ = ["esxi"]

The new sddc module doesn't declare itself as proxy enabled. So when you run it on a proxy minion it fails to load and the legacy proxy module never gets the chance to load. .

The simple fix is to just rename the file:

mv $BASE/saltext/vmware/modules/esxi.py $BASE/saltext/vmware/modules/esxi_sddc.py

So you can still use the new esxi module with a non-proxy minion...

# salt saltmaster vmware_esxi.power_state
saltmaster:
    True

And you can at least call the legacy esxi module on the proxy minion... but you get an exception...

# salt esxi_cluster1_esx01 esxi.cmd system_info
esxi_cluster1_esx01:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python3.7/site-packages/salt/metaproxy/proxy.py", line 483, in thread_return
        opts, data, func, args, kwargs
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1201, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1216, in _run_as
        return _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/executors/direct_call.py", line 10, in execute
        return func(*args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1201, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1216, in _run_as
        return _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/modules/esxi.py", line 56, in cmd
        return __proxy__[proxy_cmd](command, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1201, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.7/site-packages/salt/loader/lazy.py", line 1216, in _run_as
        return _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.7/site-packages/saltext/vmware/proxy/esxi.py", line 493, in ch_config
        for k in kwargs:
    RuntimeError: dictionary changed size during iteration
ERROR: Minions returned with non-zero exit code

There's also another bug in the proxy module where it doesn't read your verify_ssl, credstore, etc paramters from your proxy config if you're doing user/pass authentication (like I am).

Anyway, here's the fixed diff...

# x=saltext/vmware/proxy/esxi.py && diff -u --strip $x-orig $x
--- saltext/vmware/proxy/esxi.py-orig   2022-04-29 12:29:54.857270014 +0000
+++ saltext/vmware/proxy/esxi.py        2022-04-29 13:14:47.247225128 +0000
@@ -349,7 +349,6 @@
         DETAILS["mechanism"] = "userpass"
         DETAILS["protocol"] = proxy_conf.get("protocol")
         DETAILS["port"] = proxy_conf.get("port")
-        return True

     if "vcenter" in proxy_conf:
         vcenter = proxy_conf["vcenter"]
@@ -412,6 +411,7 @@
     DETAILS["protocol"] = proxy_conf.get("protocol", "https")
     DETAILS["port"] = proxy_conf.get("port", "443")
     DETAILS["credstore"] = proxy_conf.get("credstore")
+    return True


 def grains():
@@ -451,7 +451,7 @@
         if DETAILS["mechanism"] == "userpass":
             find_credentials(DETAILS["host"])
             try:
-                __salt__["vmware_info.system_info"](
+                __salt__["vsphere.system_info"](
                     host=DETAILS["host"],
                     username=DETAILS["username"],
                     password=DETAILS["password"],
@@ -490,7 +490,8 @@

     """
     # Strip the __pub_ keys...is there a better way to do this?
-    for k in kwargs:
+    kwargs_filtered = kwargs.copy()
+    for k in kwargs_filtered:
         if k.startswith("__pub_"):
             kwargs.pop(k)

@@ -520,7 +521,7 @@
         for password in passwords:
             try:
                 # Try to authenticate with the given user/password combination
-                ret = __salt__["vmware_info.system_info"](
+                ret = __salt__["vsphere.system_info"](
                     host=host, username=user, password=password, verify_ssl=verify_ssl
                 )
             except SaltSystemExit:
@@ -542,7 +543,7 @@
     username, password = find_credentials(DETAILS["host"])
     verify_ssl = DETAILS["verify_ssl"]

-    ret = __salt__["vmware_info.system_info"](
+    ret = __salt__["vsphere.system_info"](
         host=host,
         username=username,
         password=password,

And that seems to work...

# salt esxi_cluster1_esx01 esxi.cmd system_info | head -n 4
esxi_cluster1_esx01:
    ----------
    apiType:
        HostAgent

@ggiesen - the reason your credstore config wasn't working is because it was never being configured in the code. With the above patches it should work (or you can just use config proxy::verify_ssl: False, which is what I'm doing)

TL;DR

Download the two attached patch files and...

# Download the attached patch/zip into /root and...
unzip -d /tmp patch_salt-ext-modules-vmware_259_01.zip

# Set BASE to be your master's site-packages directory and change to that directory...
BASE=/usr/lib/python3.7/site-packages

# Remove the conflict between sddc and legacy esxi modules
mv $BASE/saltext/vmware/modules/esxi.py $BASE/saltext/vmware/modules/esxi_sddc.py

# patch the proxy 
patch -d $BASE saltext/vmware/proxy/esxi.py < /tmp/salt-ext-modules-vmware_259_proxy.patch

# patch the new grains module
patch -d $BASE saltext/vmware/grains/esxi.py < /tmp/salt-ext-modules-vmware_259_grains.patch

# Clean up 
rm -f /tmp/salt-ext-modules-vmware_259*

patch_salt-ext-modules-vmware_259_01.zip

dfidler avatar Apr 29 '22 13:04 dfidler

There's a wider discussion to be had here as well.

Firstly, I think that all of the vmware modules should be moved out of the core salt project and into an extension. It would cause some thrash for the users that are already using them (they would have to install the extension to all of their proxy server hosts - not a massive change but one worth broadcasting the intent about.

Secondly, we clearly need to be testing the esxi modules to support both modes of operation (against a minion and against a proxy-minion). [Although if I had my way, the "new-style" modules would go away per #257 ]

Thirdly, I wonder if it makes sense to make this project less monolithic by splitting it up into several. There are no interdependencies between the vsphere, vmc and nsx-t projects and they'll often be used by different teams.

And lastly, [and this is out of scope for this project] I wonder if we should change the loader to handle these kinds of module conflicts more elegantly by communicating to the operator that an extension has a conflicting module and that it will not be loaded because there is either a module with a duplicate filename, or a duplicate virtualname; neither should be possible. What I don't know is what that would look like. An error in a master/minion log is too silent and will lead to long hours of debugging and a total-failure of the master/minion is too aggressive. Perhaps embed a warning in the job returns that an installed module isn't being loaded because an extension conflicts with it?

dfidler avatar Apr 29 '22 14:04 dfidler