vault icon indicating copy to clipboard operation
vault copied to clipboard

Storage is read-only after reloading a custom plugin

Open TatyanaBol opened this issue 2 years ago • 23 comments

Describe the bug When upgrading Vault 1.14.2 with custom secret plugins, the following issue occurs:

  1. Vault starts running with a new secret plugin version.
  2. Plugin mounting fails due to binary with a new, not registered plugin version.
  3. The new plugin version is registered and the plugin is reloaded.
  4. An attempt to update some data in the plugin fails with the error "cannot write to storage during setup".

I guess that the issue occurs because the storage becomes read-only after plugin mounting fails. But even after the plugin is reloaded, the storage remains read-only.

We don't have this issue with Vault 1.11.6

To Reproduce Steps to reproduce the behavior:

  1. Register the first custom plugin version
  2. Run Vault with another second plugin version, register, and reload the plugin.
  3. Try to write something to the plugin path (we have plugin config)
  4. See the error "cannot write to storage during setup"

Expected behavior After the plugin reloads I expect the storage to be active, not read-only.

Environment:

  • Vault Server Version (retrieve with vault status): 1.14.2
  • Vault CLI Version (retrieve with vault version):
  • Server Operating System/Architecture:

Vault server configuration file(s):

# Paste your Vault config here.
# Be sure to scrub any sensitive values

Additional context Add any other context about the problem here.

TatyanaBol avatar Sep 19 '23 13:09 TatyanaBol

Hi @hghaf099 I am facing the same issue using the latest vault server version: 1.15.4 OS Arch: ARM-64

error msg: There was an error disabling the MyPlugin Secrets Engine at MyPluginPath/: cannot write to storage during setup.

I am using s3 as backend

karmops avatar Jan 08 '24 15:01 karmops

{
  "@level": "error",
  "@message": "unmount failed",
  "@module": "secrets.system.system_d412a3f5",
  "@timestamp": "2024-01-08T16:25:18.741073Z",
  "error": "cannot write to storage during setup",
  "path": "MyPlugin/"
}
{
  "@level": "error",
  "@message": "failed to clear view for path being unmounted",
  "@module": "core",
  "@timestamp": "2024-01-08T16:25:18.741062Z",
  "error": "cannot write to storage during setup",
  "path": "MyPlugin/"
}

karmops avatar Jan 08 '24 16:01 karmops

image

karmops avatar Jan 08 '24 21:01 karmops

Before I had this plugin running well, I migrated the pod from an AMD machine to an ARM machine... Vault has initialized successfully, and no data was lost, but the plugin is now in this state that I can't revert. Right now, the strategy is a fresh install...

karmops avatar Jan 09 '24 13:01 karmops

solution until now...

1. I have download the files generated by the plugin
2. delete all folders in my backend storage related to the plugin
3. now I can disable the plugin (I did using the vault UI)
4. then register the plugin back (arm64 version)
5. enable it
6. did some test to create a new file via plugin, in order to create a new UUID logical folder to the plugin
7. copied the previous files from the plugin downloaded on the first step
8. list the plugin content with these files (worked)

karmops avatar Jan 09 '24 18:01 karmops

@karmops I have been trying to reproduce the issue, however, no luck yet. Would you please give us a clear set of reproducing steps? Also, it would be good to know how many nodes are deployed in the cluster. Just for context, this error is returned if a write to the same path in storage is attempted while the backend setup is running.

hghaf099 avatar Feb 06 '24 22:02 hghaf099

Hello :) We're experiencing the same since 1.11.9 all the way to 1.14.8. What are the chances for this issue to be addressed in the near future?

lootek avatar Feb 20 '24 15:02 lootek

We've upgraded Vault from 1.10.10 to 1.15.5 without being aware of this issue and painted ourselves into the corner apparently. We do not like this downtime in production system to happen, so wonder if there is any other solution?

universam1 avatar Feb 26 '24 14:02 universam1

Currently experiencing the same issue due to trying to upgrade a plugin using the Vault terraform provider, which left it in a messed up state.

I am unable to disable the storage backend, even after restarting vault:

vault secrets disable my-backend
Error disabling secrets engine at my-backend/: Error making API request.

URL: DELETE https://myvault:8200/v1/sys/mounts/my-backend
Code: 400. Errors:

* cannot write to storage during setup

I am using raft storage, so cannot simply delete any files. Any solutions?

Edit: This is what I did:

  1. Deregister the plugin: vault plugin deregister -version=x.x.x secret my-plugin.
  2. Install the latest plugin to the plugin directory.
  3. Register the plugin: vault plugin register -sha256=xxx secret my-plugin.
  4. Tune the mount: vault secrets tune -plugin-version=vlatest my-mount.
  5. Reload plugin: vault plugin reload -plugin my-plugin.
  6. Restart Vault.
  7. Disable mount: vault secrets disable my-mount.

F21 avatar Mar 28 '24 02:03 F21

I tried various things but once the plugin storage is stuck in the setup/read-only state, the only thing that seems to release it is a rolling restart of the cluster.

This happens every single time I upgrade a plugin. Doesn't seem to matter if its with or without a version specified.

seanamos avatar Apr 10 '24 01:04 seanamos

@seanamos Are you using these steps to upgrade your plugins? https://developer.hashicorp.com/vault/docs/upgrading/plugins#upgrading-auth-and-secrets-plugins

F21 avatar Apr 10 '24 04:04 F21

@F21 Sorry about the delay in response, I was verifying the process and testing it again. We've automated the process and I wanted to go through it thoroughly and test it to make sure we weren't doing anything obviously wrong.

To answer your question, yes.

The old binary is kept in the plugins dir and the new binary is placed in the plugins dir on all nodes.

my-plugin-1.1.1
my-plugin-1.1.2

Basically the process from there:

vault plugin register \
    -sha256=... \
    -command=my-plugin-1.1.2 \
    -version=v1.1.2 \
    secret \
    my-plugin

vault write sys/plugins/pins/secret/my-plugin version=v1.1.2
vault plugin reload -type=secret -plugin=my-plugin -scope=global
│ Error: error writing to Vault: Error making API request.
│ 
│ URL: PUT https://xxx/v1/my-plugin/config
│ Code: 500. Errors:
│ 
│ * 1 error occurred:
│       * cannot write to storage during setup

A rolling restart of the cluster (3 Vault nodes with raft storage) resolves the issue.

I will continue to do some more testing/debugging to see if I can trace down the issue.

seanamos avatar Apr 12 '24 10:04 seanamos

any update on this? we run into the same issue

1337andre avatar Jan 03 '25 14:01 1337andre

We are also experiencing this issue on Vault 1.18.4+ent.fips.

adamrothman avatar Feb 27 '25 19:02 adamrothman

What solved the problem in our case: UPDATE: PROBLEM CAME BACK TODAY, SO THIS DID NOT SOLVE THE ISSUE The vault config file contained: api_addr: https://addr:8200 (that is protocol: HTTPS) But the listener address was set to use HTTP. Apparently , the (custom) plugin needs to talk to vault and uses the api_addr. See: https://developer.hashicorp.com/vault/docs/configuration#api_addr

Setting the api_addr to the same protocol as the listener solved the problem.

nielsdt-rabobank avatar Jun 17 '25 06:06 nielsdt-rabobank

Hey there, I am currently looking at this issue and attempting to reproduce. Would users encountering this issue be able to provide more details on which custom plugins and which Vault storage backends they are using? Thanks

hashiblaum avatar Jun 26 '25 17:06 hashiblaum

Hey there, I am currently looking at this issue and attempting to reproduce. Would users encountering this issue be able to provide more details on which custom plugins and which Vault storage backends they are using? Thanks

Hi @hashiblaum

We have developed our own AUTH plugin and we use raft storage. But we also had the issues when trying to find out how to develop custom plugins. As a complete example: We used a variation of https://developer.hashicorp.com/vault/tutorials/raft/raft-storage (without docker, local MacOs installs on three loopback interfaces, raft storage and azure kv for auto unseal ) and https://github.com/hashicorp/vault-auth-plugin-example for the plugin. Since that was only demo stuff, it was no issue to restart every time we created a new version of the example plugin.

Of course we are running our own version of the custom plugin now, using versioning on a real cluster. Still experienced the issue.

Hope this helps

nielsdt-rabobank avatar Jun 30 '25 06:06 nielsdt-rabobank

Our situation is the same as @nielsdt-rabobank: the custom plugin is an auth plugin and we use Raft storage.

adamrothman avatar Jun 30 '25 06:06 adamrothman

Hey Folks, I've been trying to reproduce this issue, but unfortunately I'm not able to. Would anyone be able to share the logs and reproduction step?

solyim avatar Aug 02 '25 20:08 solyim

@solyim What I did: Take the plugin from: https://github.com/hashicorp-education/learn-vault-plugins/tree/main/vault-plugin-auth-mock Modify it so it actually writes to backend store: See: https://gist.github.com/nielsdt-rabobank/461034117827a1ce42c8d01586e027f6 (Copy this to backend.go if you want) Make build (make the plugin)

Created a cluster on 3 loopback interfaces on my Mac (make sure the api_addr is set to https, as it is in the vault doc: https://developer.hashicorp.com/vault/docs/configuration). My feeling is that this causes the error in the first place

cp the plugin to the plugins dir, sha256 the plugin, register the plugin. Enable the backend All as per the "learn-vault-plugins" repo. vault write auth/vault-plugin-auth-mock/user/john password=password vault write auth/vault-plugin-auth-mock/login user=john password=password If you took my changes from above, you can also do a: bao list auth/vault-plugin-auth-mock/users that will tell you if the person is logged in or not

Then: Edit the plugin, change doesn't matter as long as it gives you a different SHA256 cp the new plugin to the plugins dir. vault plugin reload -plugin vault-plugin-auth-mock Then stop the current leader and wait for another node to become leader Then (if you want) you can restart the node you just took down

Then do: vault auth disable vault-plugin-auth-mock And in my case, it gives the error:

Error disabling auth method at vault-plugin-auth-mock/: Error making API request.

URL: DELETE http://localhost:8200/v1/sys/auth/vault-plugin-auth-mock Code: 400. Errors:

  • cannot write to storage during setup

There are alternative paths to this error, but I hope this one is reproducible

Hope this helps.

nielsdt-rabobank avatar Aug 04 '25 15:08 nielsdt-rabobank

@nielsdt-rabobank Thank you for this information. I think I have a good understanding as to where the issue is coming from.

I would like to confirm one more thing if you don't mind. During the below step when you are changing the plugin binary, did you register the new plugin to the catalog and pin it before doing a plugin reload?

Edit the plugin, change doesn't matter as long as it gives you a different SHA256 cp the new plugin to the plugins dir. vault plugin reload -plugin vault-plugin-auth-mock Then stop the current leader and wait for another node to become leader Then (if you want) you can restart the node you just took down

solyim avatar Aug 04 '25 16:08 solyim

We did not register it properly. What we should have done is use versioning, register and "tune" the version. But all this happened when we learned how to build plugins. So we were just overwriting the existing binary. With our (now) production plugin, with proper versioning and the api_address set to http instead of https, we have not experienced the issue anymore.

UPDATE: we had the same problem again today in a properly managed environment (with versioning and correct configuration) So apparently the config/versioning was not enough to solve the issue

nielsdt-rabobank avatar Aug 05 '25 06:08 nielsdt-rabobank

I've run into the same problem. In my case, I unregistered a secrets plugin before disabling it's associated paths. Now if I try to disable, I get the Storage is read-only error. Re-registering the plugin didn't help.
Restarting the cluster didn't help. I am able to enable and disable new paths just fine. It's only the paths from the incorrectly deregistered plugin that cannot be manipulated.

iloving avatar Oct 31 '25 18:10 iloving