blobber
blobber copied to clipboard
flaky behaviour: allocation_root_mismatch error when trying to create a directory
On windows when trying to create a new directory second time, we are seeing allocation root mismatch error. Creating directory for the first time works. But from second time onwards we are seeing this error.
also in system tests,
https://0chain.slack.com/archives/C02AV6MKT36/p1669058193454869
we are having flaky behaviour where the same set of tests fail due to allocation root miss match error. But it gets resolved on retry.
Step to reproduce:
At the moment, we don't have a consistent way to reproduce this issue.
also fails on rename operation: https://github.com/0chain/zboxcli/actions/runs/3533560710/jobs/5930800557
utils.go:66: Command failed on attempt [2/3] due to error [exit status 1]. Output: [consensus_not_met: Required consensus 3 got 2. Error: commit_error: {"error":"write_marker_verification_failed: Verification of write marker failed: invalid_write_marker: Invalid write marker. Prev Allocation root does not match the allocation root on record"}]
1745
utils.go:66: Command failed on attempt [2/3] due to error [exit status 1]. Output: [consensus_not_met: Required consensus 3 got 2. Error: commit_error: {"error":"allocation_root_mismatch: Allocation root in the write marker does not match the calculated allocation root. Expected hash: 3ed23206277bd827822f9536671729dbcfabdedb3be3a68852347c0ff0d34d2e"}]
1746
utils.go:66: Command failed on attempt [2/3] due to error [exit status 1]. Output: [Delete failed. consensus_not_met: Consensus on commit not met. Required 3, got 1]
1747
utils.go:69: Command failed on final attempt [3/3] due to error [exit status 1]. Command String: [./zbox rename --allocation a51da28dd872c9e18c64f875739a7a675be6f52c6b5b1268c16f214daebfb685 --remotepath /A7ZQUQnrwi_test.txt --destname xwIq7e8GyB_test.txt --silent --wallet TestFileRename-Rename_and_delete_file_concurrently,_should_work_wallet.json --configDir ./config --config ./zbox_config.yaml] Output: [consensus_not_met: Rename failed. Required consensus 3 got 2]
@boddumanohar can you share steps/some insights to reproduce this? And does this happen only on windows or we can reproduce it in Linux env too? The slack link isn't working and the test run link is expired, so I am unable to get more context on this :(
On linux, we used to see this on system_tests repo. But we haven't been seeing it lately. If it occurs again, @Kishan-Dhakan please comment the build here. Thanks.
Had a chat with @boddumanohar and concluded that we used to see these issues a lot 3 months ago, but are rare these days. As suggested by him, I am deferring working on this one until we have another instance of this (showing up through one of the system tests failing) cc: @dabasov @Kishan-Dhakan
@aniketsingh03 please write a test to prove this
Was having a look at this today. This might be happening because of a mismatch in rootRef hash that is used to generate allocationRoot
that is written into write_marker form var and sent along with commit request to a blobber , AND the rootRef hash that blobber extracts before finalizing a commit.
func (req *CommitRequest) commitBlobber(rootRef *fileref.Ref, latestWM *marker.WriteMarker, size int64) error {
wm := &marker.WriteMarker{}
timestamp := int64(common.Now())
wm.AllocationRoot = encryption.Hash(rootRef.Hash + ":" + strconv.FormatInt(timestamp, 10))
if latestWM != nil {
AND
https://github.com/0chain/blobber/blob/3d7e1101ce3c4beba1f98cc026638d9c17220dbd/code/go/0chain.net/blobbercore/handler/object_operation_handler.go#L438-L446
And we already have 2 system tests which test creation of 2 directories with same and different names (as mentioned in the issue description):
t.Run("create dir with existing dirname should work", func(t *test.SystemTest) {
allocID := setupAllocation(t, configPath)
dirname := "/existingdir"
output, err := createDir(t, configPath, allocID, dirname, true)
and
t.Run("create with existing dir but different case", func(t *test.SystemTest) {
allocID := setupAllocation(t, configPath)
dirname := "/existingdir"
output, err := createDir(t, configPath, allocID, dirname, true)
require.Nil(t, err, "Unexpected create dir failure %s", strings.Join(output, "\n"))
require.Len(t, output, 1)
require.Equal(t, dirname+" directory created", output[0])
dirname = "/existingDir"
output, err = createDir(t, configPath, allocID, dirname, true)
require.Nil(t, err, "Unexpected create dir failure %s", strings.Join(output, "\n"))
require.Len(t, output, 1)
require.Equal(t, dirname+" directory created", output[0])
So, I am not sure what type of a system test would help in catching this case (maybe creating both the directories very fast so that the creation of 2nd dir happens BEFORE the commit for 1st one is finalized?) cc: @dabasov @boddumanohar
Maybe add a test to create 100 directories concurrently and raise error if there is allocation mismatch error in any goroutine and consider it ok if context deadline or timeout error occurs in some goroutine.
closed it, as could not reproduce, @Kishan-Dhakan please create an issue for a system_test repo