Possible Bug in DataplaneAPI: Dangling and Duplicated Transaction Files with Concurrent Jobs (HAProxy Terraform Provider)
Description: I have encountered a potential bug in the DataplaneAPI while working on an HAProxy provider. The issue arises when running concurrent jobs that interact with the API across multiple workspaces.
The problem does not occur when using a single workspace. Specifically:
- Dangling Transactions: Some transactions remain uncommitted and incomplete.
- Duplicate Files: The same transaction files also appear in the outdated/ directory.
This behavior suggests there might be an issue with transaction isolation or handling in high-concurrency, multi-workspace scenarios.
Steps to Reproduce: Use my HAProxy provider available on the Terraform Registry: https://registry.terraform.io/providers/cepitacio/haproxy/latest
- Set up the DataplaneAPI and configure multiple workspaces.
- Run multiple concurrent terraform apply jobs using the provider to interact with the API.
Check the transaction directories:
- Observe uncommitted transaction files in the main directory.
- Note that duplicates of these files appear in the outdated/ directory.
Expected Behavior:
- Transactions should either be committed or cleaned up properly.
- No duplicate transaction files should exist, even under concurrent usage across multiple workspaces.
Actual Behavior:
- Some transactions remain uncommitted and are left dangling.
- Duplicate transaction files are created in the outdated/ directory.
Environment:
DataplaneAPI version: reproduced with v2.9.2 and v2.9.8
HAProxy version: 2.9.0
Terraform provider: https://registry.terraform.io/providers/cepitacio/haproxy/latest
Additional Notes: I am relatively new to Go and Terraform provider development, so I might have missed something in my implementation. However, the issue seems to be directly related to the API's behavior under concurrent requests. The issue does not occur when jobs are executed sequentially in a single workspace, which suggests it may be related to handling concurrent API transactions across workspaces.
Here is an example of what we are seeing with concurrent jobs:
[root@haproxy1 haproxy]# ll /tmp/dataplaneapi/transactions/
total 39
drwxr-xr-x. 2 root root 4096 Jan 24 22:20 failed
-rw-r--r--. 1 root root 6509 Jan 24 22:23 haproxy.cfg.29ab53bf-436a-421a-a39b-6f86f7fe81f1
-rw-r--r--. 1 root root 16186 Jan 24 22:23 haproxy.cfg.db6e7ac4-fc93-4e75-ae80-667bac1f5f9f
drwxr-xr-x. 2 root root 12288 Jan 24 22:23 outdated
Above two files are also present/duplicated in the outdated/ directory.
[root@haproxy1 haproxy]# ll /tmp/dataplaneapi/transactions/outdated/ | grep haproxy.cfg.29ab53bf-436a-421a-a39b-6f86f7fe81f1
-rw-r--r--. 1 root root 6708 Jan 24 22:23 haproxy.cfg.29ab53bf-436a-421a-a39b-6f86f7fe81f1
[root@haproxy1 haproxy]# ll /tmp/dataplaneapi/transactions/outdated/ | grep haproxy.cfg.db6e7ac4-fc93-4e75-ae80-667bac1f5f9
-rw-r--r--. 1 root root 16387 Jan 24 22:23 haproxy.cfg.db6e7ac4-fc93-4e75-ae80-667bac1f5f9f
So Data Plane API operates on optimistic locking using transactions. Meaning that high concurrency isn't available in this case. If multiple transactions are started on one version of the file, only the first one that is committed will succeed, all the rest will be outdated and cannot be committed. Hope this helps your issue?
@mjuraga thanks for the quick response! While I understand the optimistic locking mechanism, the issue I’m encountering is with the cleanup process in multi-workspace scenarios.
When running multiple workspaces concurrently (around 13), I’m seeing the following:
- Dangling files remain in the
/tmp/dataplaneapi/transactions/directory (uncommitted transactions) and are not cleaned up as they should be. - The same files also appear in the
/tmp/dataplaneapi/transactions/outdated/directory, even though they should have been properly cleaned up or marked as outdated and removed after the transaction becomes outdated. A transaction file should never exist in both directories (in progress intransactions/and outdated inoutdated/).
The provider handles concurrency by implementing a retry mechanism when a transaction version or commit becomes outdated. However, the cleanup process seems to fail in high-concurrency scenarios, leaving outdated files behind.
oh, I understand now, thank you, we can treat this as a bug and fix it. I'll get back to you with a fix.