fluvio icon indicating copy to clipboard operation
fluvio copied to clipboard

[Bug]: Topic creation fails if the directory with the spu-logs has a directory with logs for that topic

Open morenol opened this issue 4 years ago • 8 comments

If we delete a topic, and for some reason the directory with the logs for the partitions of that topic are not erased, then we cannot create topics with the same name.

So, for instance if we create the topic abc. And we have this on disk `/var/lib/fluvio/data/spu-logs-0/abc-0.

The spu is logging this error:

fluvio_spu::control_plane::dispatcher: error storage Log validation error

We need to handle this error and not fail. Maybe by deleting these files.

We also need to investigate why the SPU gets into that state.

morenol avatar Feb 09 '22 17:02 morenol

When SPU perform following operation

  • Replication Creation,
  • Full Replica Sync.

If SPU encounters existing replica directory, rename them (something like 000000.old-log) assuming there are enough disk space. They can be garbage collected by the cleaner. Also change existing delete replica logic to use same rename trick.

sehz avatar Feb 09 '22 18:02 sehz

 ls -la /var/lib/fluvio/data/spu-logs-0/<topic-name>-0
total 38860
drwxr-xr-x    2 root     root            95 Jan 29 22:34 .
drwxr-xr-x    4 root     root            49 Feb  8 21:23 ..
-rw-r--r--    1 root     root      10485760 Feb  8 20:32 00000000000000000000.index
-rw-r--r--    1 root     root      39776256 Jan 30 14:34 00000000000000000000.log
-rw-r--r--    1 root     root             8 Jan 30 13:35 replication.chk

morenol avatar Feb 09 '22 20:02 morenol

this can be replicated by going to SPU volume, creating a folder with a name like this one /var/lib/fluvio/data/spu-logs-0/test-topic-0 and then running fluvio topic create test-topic

morenol avatar Mar 07 '22 20:03 morenol

move out milestone until there is way to reproduce this bug

sehz avatar Apr 14 '22 00:04 sehz

I added a way to reproduce this bug, also this happens when the topic is offline for some reason and you use fluvio topic delete (this will not delete the files on the SPU), and then try to create the topic again in the same SPU where it was before

morenol avatar Apr 14 '22 01:04 morenol

Something like this?

  • Create cluster
  • Create topic
  • write some data topic
  • Disable SPU, this should make topic offline
  • Delete topic
  • Create Topic with same name

sehz avatar Apr 14 '22 01:04 sehz

For disabling SPU, you could set replicas to 0 on the stateful set

nacardin avatar May 06 '22 17:05 nacardin

Stale issue message

github-actions[bot] avatar Jul 06 '22 11:07 github-actions[bot]