Storage is read-only after reloading a custom plugin
Describe the bug When upgrading Vault 1.14.2 with custom secret plugins, the following issue occurs:
- Vault starts running with a new secret plugin version.
- Plugin mounting fails due to binary with a new, not registered plugin version.
- The new plugin version is registered and the plugin is reloaded.
- An attempt to update some data in the plugin fails with the error "cannot write to storage during setup".
I guess that the issue occurs because the storage becomes read-only after plugin mounting fails. But even after the plugin is reloaded, the storage remains read-only.
We don't have this issue with Vault 1.11.6
To Reproduce Steps to reproduce the behavior:
- Register the first custom plugin version
- Run Vault with another second plugin version, register, and reload the plugin.
- Try to write something to the plugin path (we have plugin config)
- See the error "cannot write to storage during setup"
Expected behavior After the plugin reloads I expect the storage to be active, not read-only.
Environment:
- Vault Server Version (retrieve with
vault status): 1.14.2 - Vault CLI Version (retrieve with
vault version): - Server Operating System/Architecture:
Vault server configuration file(s):
# Paste your Vault config here.
# Be sure to scrub any sensitive values
Additional context Add any other context about the problem here.
Hi @hghaf099 I am facing the same issue using the latest vault server version: 1.15.4
OS Arch: ARM-64
error msg: There was an error disabling the MyPlugin Secrets Engine at MyPluginPath/: cannot write to storage during setup.
I am using s3 as backend
{
"@level": "error",
"@message": "unmount failed",
"@module": "secrets.system.system_d412a3f5",
"@timestamp": "2024-01-08T16:25:18.741073Z",
"error": "cannot write to storage during setup",
"path": "MyPlugin/"
}
{
"@level": "error",
"@message": "failed to clear view for path being unmounted",
"@module": "core",
"@timestamp": "2024-01-08T16:25:18.741062Z",
"error": "cannot write to storage during setup",
"path": "MyPlugin/"
}
Before I had this plugin running well, I migrated the pod from an AMD machine to an ARM machine... Vault has initialized successfully, and no data was lost, but the plugin is now in this state that I can't revert. Right now, the strategy is a fresh install...
solution until now...
1. I have download the files generated by the plugin
2. delete all folders in my backend storage related to the plugin
3. now I can disable the plugin (I did using the vault UI)
4. then register the plugin back (arm64 version)
5. enable it
6. did some test to create a new file via plugin, in order to create a new UUID logical folder to the plugin
7. copied the previous files from the plugin downloaded on the first step
8. list the plugin content with these files (worked)
@karmops I have been trying to reproduce the issue, however, no luck yet. Would you please give us a clear set of reproducing steps? Also, it would be good to know how many nodes are deployed in the cluster. Just for context, this error is returned if a write to the same path in storage is attempted while the backend setup is running.
Hello :) We're experiencing the same since 1.11.9 all the way to 1.14.8. What are the chances for this issue to be addressed in the near future?
We've upgraded Vault from 1.10.10 to 1.15.5 without being aware of this issue and painted ourselves into the corner apparently. We do not like this downtime in production system to happen, so wonder if there is any other solution?
Currently experiencing the same issue due to trying to upgrade a plugin using the Vault terraform provider, which left it in a messed up state.
I am unable to disable the storage backend, even after restarting vault:
vault secrets disable my-backend
Error disabling secrets engine at my-backend/: Error making API request.
URL: DELETE https://myvault:8200/v1/sys/mounts/my-backend
Code: 400. Errors:
* cannot write to storage during setup
I am using raft storage, so cannot simply delete any files. Any solutions?
Edit: This is what I did:
- Deregister the plugin:
vault plugin deregister -version=x.x.x secret my-plugin. - Install the latest plugin to the plugin directory.
- Register the plugin:
vault plugin register -sha256=xxx secret my-plugin. - Tune the mount:
vault secrets tune -plugin-version=vlatest my-mount. - Reload plugin:
vault plugin reload -plugin my-plugin. - Restart Vault.
- Disable mount:
vault secrets disable my-mount.
I tried various things but once the plugin storage is stuck in the setup/read-only state, the only thing that seems to release it is a rolling restart of the cluster.
This happens every single time I upgrade a plugin. Doesn't seem to matter if its with or without a version specified.
@seanamos Are you using these steps to upgrade your plugins? https://developer.hashicorp.com/vault/docs/upgrading/plugins#upgrading-auth-and-secrets-plugins
@F21 Sorry about the delay in response, I was verifying the process and testing it again. We've automated the process and I wanted to go through it thoroughly and test it to make sure we weren't doing anything obviously wrong.
To answer your question, yes.
The old binary is kept in the plugins dir and the new binary is placed in the plugins dir on all nodes.
my-plugin-1.1.1
my-plugin-1.1.2
Basically the process from there:
vault plugin register \
-sha256=... \
-command=my-plugin-1.1.2 \
-version=v1.1.2 \
secret \
my-plugin
vault write sys/plugins/pins/secret/my-plugin version=v1.1.2
vault plugin reload -type=secret -plugin=my-plugin -scope=global
│ Error: error writing to Vault: Error making API request.
│
│ URL: PUT https://xxx/v1/my-plugin/config
│ Code: 500. Errors:
│
│ * 1 error occurred:
│ * cannot write to storage during setup
A rolling restart of the cluster (3 Vault nodes with raft storage) resolves the issue.
I will continue to do some more testing/debugging to see if I can trace down the issue.
any update on this? we run into the same issue
We are also experiencing this issue on Vault 1.18.4+ent.fips.
What solved the problem in our case: UPDATE: PROBLEM CAME BACK TODAY, SO THIS DID NOT SOLVE THE ISSUE The vault config file contained: api_addr: https://addr:8200 (that is protocol: HTTPS) But the listener address was set to use HTTP. Apparently , the (custom) plugin needs to talk to vault and uses the api_addr. See: https://developer.hashicorp.com/vault/docs/configuration#api_addr
Setting the api_addr to the same protocol as the listener solved the problem.
Hey there, I am currently looking at this issue and attempting to reproduce. Would users encountering this issue be able to provide more details on which custom plugins and which Vault storage backends they are using? Thanks
Hey there, I am currently looking at this issue and attempting to reproduce. Would users encountering this issue be able to provide more details on which custom plugins and which Vault storage backends they are using? Thanks
Hi @hashiblaum
We have developed our own AUTH plugin and we use raft storage. But we also had the issues when trying to find out how to develop custom plugins. As a complete example: We used a variation of https://developer.hashicorp.com/vault/tutorials/raft/raft-storage (without docker, local MacOs installs on three loopback interfaces, raft storage and azure kv for auto unseal ) and https://github.com/hashicorp/vault-auth-plugin-example for the plugin. Since that was only demo stuff, it was no issue to restart every time we created a new version of the example plugin.
Of course we are running our own version of the custom plugin now, using versioning on a real cluster. Still experienced the issue.
Hope this helps
Our situation is the same as @nielsdt-rabobank: the custom plugin is an auth plugin and we use Raft storage.
Hey Folks, I've been trying to reproduce this issue, but unfortunately I'm not able to. Would anyone be able to share the logs and reproduction step?
@solyim What I did: Take the plugin from: https://github.com/hashicorp-education/learn-vault-plugins/tree/main/vault-plugin-auth-mock Modify it so it actually writes to backend store: See: https://gist.github.com/nielsdt-rabobank/461034117827a1ce42c8d01586e027f6 (Copy this to backend.go if you want) Make build (make the plugin)
Created a cluster on 3 loopback interfaces on my Mac (make sure the api_addr is set to https, as it is in the vault doc: https://developer.hashicorp.com/vault/docs/configuration). My feeling is that this causes the error in the first place
cp the plugin to the plugins dir, sha256 the plugin, register the plugin. Enable the backend All as per the "learn-vault-plugins" repo. vault write auth/vault-plugin-auth-mock/user/john password=password vault write auth/vault-plugin-auth-mock/login user=john password=password If you took my changes from above, you can also do a: bao list auth/vault-plugin-auth-mock/users that will tell you if the person is logged in or not
Then: Edit the plugin, change doesn't matter as long as it gives you a different SHA256 cp the new plugin to the plugins dir. vault plugin reload -plugin vault-plugin-auth-mock Then stop the current leader and wait for another node to become leader Then (if you want) you can restart the node you just took down
Then do: vault auth disable vault-plugin-auth-mock And in my case, it gives the error:
Error disabling auth method at vault-plugin-auth-mock/: Error making API request.
URL: DELETE http://localhost:8200/v1/sys/auth/vault-plugin-auth-mock Code: 400. Errors:
- cannot write to storage during setup
There are alternative paths to this error, but I hope this one is reproducible
Hope this helps.
@nielsdt-rabobank Thank you for this information. I think I have a good understanding as to where the issue is coming from.
I would like to confirm one more thing if you don't mind. During the below step when you are changing the plugin binary, did you register the new plugin to the catalog and pin it before doing a plugin reload?
Edit the plugin, change doesn't matter as long as it gives you a different SHA256 cp the new plugin to the plugins dir. vault plugin reload -plugin vault-plugin-auth-mock Then stop the current leader and wait for another node to become leader Then (if you want) you can restart the node you just took down
We did not register it properly. What we should have done is use versioning, register and "tune" the version. But all this happened when we learned how to build plugins. So we were just overwriting the existing binary. With our (now) production plugin, with proper versioning and the api_address set to http instead of https, we have not experienced the issue anymore.
UPDATE: we had the same problem again today in a properly managed environment (with versioning and correct configuration) So apparently the config/versioning was not enough to solve the issue
I've run into the same problem. In my case, I unregistered a secrets plugin before disabling it's associated paths. Now if I try to disable, I get the Storage is read-only error.
Re-registering the plugin didn't help.
Restarting the cluster didn't help.
I am able to enable and disable new paths just fine. It's only the paths from the incorrectly deregistered plugin that cannot be manipulated.