[Bug]: Topic creation fails if the directory with the spu-logs has a directory with logs for that topic
If we delete a topic, and for some reason the directory with the logs for the partitions of that topic are not erased, then we cannot create topics with the same name.
So, for instance if we create the topic abc. And we have this on disk `/var/lib/fluvio/data/spu-logs-0/abc-0.
The spu is logging this error:
fluvio_spu::control_plane::dispatcher: error storage Log validation error
We need to handle this error and not fail. Maybe by deleting these files.
We also need to investigate why the SPU gets into that state.
When SPU perform following operation
- Replication Creation,
- Full Replica Sync.
If SPU encounters existing replica directory, rename them (something like 000000.old-log) assuming there are enough disk space. They can be garbage collected by the cleaner. Also change existing delete replica logic to use same rename trick.
ls -la /var/lib/fluvio/data/spu-logs-0/<topic-name>-0
total 38860
drwxr-xr-x 2 root root 95 Jan 29 22:34 .
drwxr-xr-x 4 root root 49 Feb 8 21:23 ..
-rw-r--r-- 1 root root 10485760 Feb 8 20:32 00000000000000000000.index
-rw-r--r-- 1 root root 39776256 Jan 30 14:34 00000000000000000000.log
-rw-r--r-- 1 root root 8 Jan 30 13:35 replication.chk
this can be replicated by going to SPU volume, creating a folder with a name like this one /var/lib/fluvio/data/spu-logs-0/test-topic-0 and then running fluvio topic create test-topic
move out milestone until there is way to reproduce this bug
I added a way to reproduce this bug, also this happens when the topic is offline for some reason and you use fluvio topic delete (this will not delete the files on the SPU), and then try to create the topic again in the same SPU where it was before
Something like this?
- Create cluster
- Create topic
- write some data topic
- Disable SPU, this should make topic offline
- Delete topic
- Create Topic with same name
For disabling SPU, you could set replicas to 0 on the stateful set
Stale issue message