ocis icon indicating copy to clipboard operation
ocis copied to clipboard

Flaky changing quota requests - fails with 500 error

Open saw-jan opened this issue 1 year ago • 20 comments

Builds:

  • https://drone.owncloud.com/owncloud/ocis/41199/38/6
  • https://drone.owncloud.com/owncloud/ocis/41198/22/5
  • https://drone.owncloud.com/owncloud/ocis/41180/54/5
  • https://drone.owncloud.com/owncloud/ocis/41185/38/6
    And user "Admin" has changed the quota of the personal space of "Alice Hansen" space to "1"                                          # SpacesContext::userHasChangedTheQuotaOfTheSpaceTo()
      Expected response status code should be 200
      Failed asserting that 500 matches expected 200.
==> REQUEST
PATCH /graph/v1.0/drives/28087b3e-b3a7-44a4-850f-082544e6f8b1$ca3ed4f6-c7ae-4ada-a482-95c459958ac3
X-Request-ID: coreApiShareOperationsToShares2/uploadToShare.feature:238-231
==> REQ BODY
{"quota":{"total":1}}

<== RESPONSE
500 Internal Server Error
X-Xss-Protection: 1; mode=block
<== RES BODY
{"error":{"code":"generalException","innererror":{"date":"2024-11-19T05:15:58Z","request-id":"coreApiShareOperationsToShares2/uploadToShare.feature:238-231"},"message":"grpc error"}}

Scenarios:

apiSpacesShares/shareOperations.feature:340
coreApiShareOperationsToShares2/uploadToShare.feature:218
coreApiShareOperationsToShares2/uploadToShare.feature:249

ocis log:

{"level":"error","service":"storage-users","pkg":"rgrpc","traceid":"2ce5a123ee3e21770ab2587d496aee71","error":"open /root/.ocis/storage/users/spaces/5a/3f0b56-cd9e-4575-ab9d-ec57268b6e6c/nodes/5a/3f/0b/56/-cd9e-4575-ab9d-ec57268b6e6c.mlock: no such file or directory","req":{"storage_space":{"id":{"opaque_id":"1e0ee64a-e36c-4a09-9138-0004fb359bce$5a3f0b56-cd9e-4575-ab9d-ec57268b6e6c"},"root":{"storage_id":"1e0ee64a-e36c-4a09-9138-0004fb359bce","opaque_id":"5a3f0b56-cd9e-4575-ab9d-ec57268b6e6c","space_id":"5a3f0b56-cd9e-4575-ab9d-ec57268b6e6c"},"quota":{"quota_max_bytes":1}}},"time":"2024-11-19T11:03:10Z","message":"failed to update storage space"}
{"level":"error","service":"storage-users","pkg":"rgrpc","traceid":"2ce5a123ee3e21770ab2587d496aee71","user-agent":"grpc-go/1.68.0","from":"tcp://127.0.0.1:46852","uri":"/cs3.storage.provider.v1beta1.SpacesAPI/UpdateStorageSpace","start":"19/Nov/2024:11:03:10 +0000","end":"19/Nov/2024:11:03:10 +0000","time_ns":1141739,"code":"Unknown","time":"2024-11-19T11:03:10Z","message":"open /root/.ocis/storage/users/spaces/5a/3f0b56-cd9e-4575-ab9d-ec57268b6e6c/nodes/5a/3f/0b/56/-cd9e-4575-ab9d-ec57268b6e6c.mlock: no such file or directory"}

saw-jan avatar Nov 19 '24 11:11 saw-jan

^ CC @ScharfViktor @micbar @kobergj

saw-jan avatar Nov 19 '24 11:11 saw-jan

This list will be updated frequently. Failed CI builds:

  • https://drone.owncloud.com/owncloud/ocis/42271/62/5 (2024-12-31)
  • https://drone.owncloud.com/owncloud/ocis/41220/38/6
  • https://drone.owncloud.com/owncloud/ocis/41225/22/5
  • https://drone.owncloud.com/owncloud/ocis/41263/39/5
  • https://drone.owncloud.com/owncloud/ocis/41272/35/5
  • https://drone.owncloud.com/owncloud/ocis/41288/38/5
  • https://drone.owncloud.com/owncloud/ocis/41303/38/5
  • https://drone.owncloud.com/owncloud/ocis/41417/53/5
  • https://drone.owncloud.com/owncloud/ocis/41443/54/5
  • https://drone.owncloud.com/owncloud/ocis/41445/54/5
  • https://drone.owncloud.com/owncloud/ocis/41446/54/5
  • https://drone.owncloud.com/owncloud/ocis/41450/54/5
  • https://drone.owncloud.com/owncloud/ocis/41492/35/5
  • https://drone.owncloud.com/owncloud/ocis/41494/38/6
  • https://drone.owncloud.com/owncloud/ocis/41515/35/5
  • https://drone.owncloud.com/owncloud/ocis/41594/35/5
  • https://drone.owncloud.com/owncloud/ocis/41651/50/5
  • https://drone.owncloud.com/owncloud/ocis/41651/51/5
  • https://drone.owncloud.com/owncloud/ocis/41644/54/5
  • https://drone.owncloud.com/owncloud/ocis/41658/53/5
  • https://drone.owncloud.com/owncloud/ocis/41662/54/5
  • https://drone.owncloud.com/owncloud/ocis/41839/50/5
  • https://drone.owncloud.com/owncloud/ocis/41965/54/5
  • https://drone.owncloud.com/owncloud/ocis/42080/54/5
  • https://drone.owncloud.com/owncloud/ocis/42494/54/5 (2025-01-09)
  • https://drone.owncloud.com/owncloud/ocis/42486/54/5
  • https://drone.owncloud.com/owncloud/ocis/42479/55/5
  • https://drone.owncloud.com/owncloud/ocis/42953/53/5
  • https://drone.owncloud.com/owncloud/ocis/42933/55/5
  • https://drone.owncloud.com/owncloud/ocis/44189/51/5

saw-jan avatar Nov 20 '24 05:11 saw-jan

We should investigate why we get an 500 error in the DEV team.

micbar avatar Nov 21 '24 08:11 micbar

This is in the logs of the latest failed run:

"level":"error","service":"storage-users","pkg":"rgrpc","traceid":"c92e9c1513f71fd3755bfe9419e7db15","error":"open /root/.ocis/storage/users/spaces/b4/68c2d3-0bc1-4b41-bfe7-c85db387057d/nodes/b4/68/c2/d3/-0bc1-4b41-bfe7-c85db387057d.mlock: no such file or 
directory","req":{"storage_space":{"id":{"opaque_id":"96b0a8ee-ab8d-49e8-87dd-e1c3331c77f7$b468c2d3-0bc1-4b41-bfe7-c85db387057d"},"root":{"storage_id":"96b0a8ee-ab8d-49e8-87dd-e1c3331c77f7","opaque_id":"b468c2d3-0bc1-4b41-bfe7-c85db387057d","space_id":"b468
c2d3-0bc1-4b41-bfe7-c85db387057d"},"quota":{"quota_max_bytes":1}}},"time":"2024-11-27T09:23:36Z","message":"failed to update storage space"}
{"level":"error","service":"storage-users","pkg":"rgrpc","traceid":"c92e9c1513f71fd3755bfe9419e7db15","user-agent":"grpc-go/1.68.0","from":"tcp://127.0.0.1:34124","uri":"/cs3.storage.provider.v1beta1.SpacesAPI/UpdateStorageSpace","start":"27/Nov/2024:09:23:
36 +0000","end":"27/Nov/2024:09:23:36 +0000","time_ns":3131418,"code":"Unknown","time":"2024-11-27T09:23:36Z","message":"open /root/.ocis/storage/users/spaces/b4/68c2d3-0bc1-4b41-bfe7-c85db387057d/nodes/b4/68/c2/d3/-0bc1-4b41-bfe7-c85db387057d.mlock: no suc
h file or directory"}

The lock file disappeared?

rhafer avatar Nov 27 '24 10:11 rhafer

The lock file disappeared?

More likely: the parent folder doesn't exist. AFAICS the lock file is always opened with O_CREAT

rhafer avatar Nov 27 '24 11:11 rhafer

I created a stripped down ci config for reproducing this issue (https://github.com/owncloud/ocis/pull/10678). Of about 10 runs in that configuration only a single one (https://drone.owncloud.com/owncloud/ocis/41458) failed. After increasing the loglevel to debug, I could not reproduce this at all anymore :sob:

Running stuff locally I wasn't able to reproduce this either.

rhafer avatar Nov 27 '24 15:11 rhafer

I wasn't able to get any useful log out of drone (see https://github.com/owncloud/ocis/pull/10678) up to now. And the issue is also not reproducible outside the CI.

I am unassigning myself for now. Since there are higher prio issues to work on :man_shrugging:

rhafer avatar Dec 03 '24 15:12 rhafer

[Nightly-7-0] Nightly failure on changing quota tests

Describe

Build - https://drone.owncloud.com/owncloud/ocis/42165/59/6 https://drone.owncloud.com/owncloud/ocis/42164 Starting from 2024-12-24, getquota tests have been failing during nightly runs. The following scenarios are affected:

  Scenario Outline: retrieving folder quota when no quota is set                                      # /drone/src/tests/acceptance/features/coreApiWebdavProperties/getQuota.feature:12
    Given using <dav-path-version> DAV path                                                           # FeatureContext::usingOldOrNewOrSpacesDavPath()
    When user "Admin" changes the quota of the "Alice Hansen" space to "0"                            # SpacesContext::updateSpaceQuota()
    Then the HTTP status code should be "200"                                                         # FeatureContext::thenTheHTTPStatusCodeShouldBe()
    And as user "Alice" folder "/" should contain a property "d:quota-available-bytes" with value "0" # WebDavPropertiesContext::asUserFolderShouldContainAPropertyWithValueOrWithValue()

    Examples:
      | dav-path-version |
      | old              |
        Failed step: Then the HTTP status code should be "200"
        HTTP status code 500 is not the expected value 200
        Failed asserting that 500 matches expected '200'.
      | new              |
        Failed step: Then the HTTP status code should be "200"
        HTTP status code 500 is not the expected value 200
        Failed asserting that 500 matches expected '200'.
      | spaces           |

  @smokeTest
  Scenario Outline: retrieving folder quota when quota is set                                             # /drone/src/tests/acceptance/features/coreApiWebdavProperties/getQuota.feature:24
    Given using <dav-path-version> DAV path                                                               # FeatureContext::usingOldOrNewOrSpacesDavPath()
    When user "Admin" changes the quota of the "Alice Hansen" space to "10000"                            # SpacesContext::updateSpaceQuota()
    Then the HTTP status code should be "200"                                                             # FeatureContext::thenTheHTTPStatusCodeShouldBe()
    And as user "Alice" folder "/" should contain a property "d:quota-available-bytes" with value "10000" # WebDavPropertiesContext::asUserFolderShouldContainAPropertyWithValueOrWithValue()

    Examples:
      | dav-path-version |
      | old              |
        Failed step: Then the HTTP status code should be "200"
        HTTP status code 500 is not the expected value 200
        Failed asserting that 500 matches expected '200'.
      | new              |
      | spaces           |

Logs

## Scenario Outline: retrieving folder quota when no quota is set (coreApiWebdavProperties/getQuota.feature:19)
		_______________________________________________________________________

		==> REQUEST
		PATCH /graph/v1.0/drives/d086f596-306d-4b6b-9a15-614f8da06e5b$f0e38ca9-efbd-41da-aea7-52494bbbaae7
		X-Request-ID: coreApiWebdavProperties/getQuota.feature:19-14
		==> REQ BODY
		{"quota":{"total":0}}

		<== RESPONSE
		500 Internal Server Error
		X-Xss-Protection: 1; mode=block
		<== RES BODY
		{"error":{"code":"generalException","innererror":{"date":"2024-12-24T03:48:12Z","request-id":"coreApiWebdavProperties/getQuota.feature:19-14"},"message":"grpc error"}}

	### Then the HTTP status code should be "200"

oCIS log:

{"level":"error","service":"storage-users","pkg":"rgrpc","traceid":"7394e6afa4f706b829a842009cd1a2cd","error":"open /root/.ocis/storage/users/spaces/f2/a62d38-7f41-427e-8c2e-32b119fa1ae4/nodes/f2/a6/2d/38/-7f41-427e-8c2e-32b119fa1ae4.mlock: no such file or directory","req":{"storage_space":{"id":{"opaque_id":"684b7b1e-4b27-4cd0-a1da-7529d23957b2$f2a62d38-7f41-427e-8c2e-32b119fa1ae4"},"root":{"storage_id":"684b7b1e-4b27-4cd0-a1da-7529d23957b2","opaque_id":"f2a62d38-7f41-427e-8c2e-32b119fa1ae4","space_id":"f2a62d38-7f41-427e-8c2e-32b119fa1ae4"},"quota":{"quota_max_bytes":10000}}},"time":"2024-12-24T00:51:19Z","message":"failed to update storage space"}
{"level":"error","service":"storage-users","pkg":"rgrpc","traceid":"7394e6afa4f706b829a842009cd1a2cd","user-agent":"grpc-go/1.68.0","from":"tcp://127.0.0.1:36716","uri":"/cs3.storage.provider.v1beta1.SpacesAPI/UpdateStorageSpace","start":"24/Dec/2024:00:51:19 +0000","end":"24/Dec/2024:00:51:19 +0000","time_ns":1327822,"code":"Unknown","time":"2024-12-24T00:51:19Z","message":"open /root/.ocis/storage/users/spaces/f2/a62d38-7f41-427e-8c2e-32b119fa1ae4/nodes/f2/a6/2d/38/-7f41-427e-8c2e-32b119fa1ae4.mlock: no such file or directory"}

Setup

Please describe how you started the server and provide a list of relevant environment variables or configuration files.

OCIS_XXX=7.0.0

amrita-shrestha avatar Dec 24 '24 04:12 amrita-shrestha

Couldn't reproduce the flakiness locally. It only occurs in CI and also doesn't generate any useful logs in CI. Currently, I'm unassigning myself form this issue.

PrajwolAmatya avatar Jan 23 '25 09:01 PrajwolAmatya

List of recent build fail:

  • https://drone.owncloud.com/owncloud/ocis/43448/51/5
  • https://drone.owncloud.com/owncloud/ocis/43507/56/5

PrajwolAmatya avatar Feb 14 '25 07:02 PrajwolAmatya

Interesting find: The server sends 500 because there are two personal drives with the same name (alice). And there are some files missing for the old user in the storage/users/spaces folder.

I believe, the old one is from the previous test scenario.

Ednpoint: /graph/v1.0/drives

{
  "value": [
    {
      "driveAlias": "personal/alice",
      "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$c3326e64-7aa8-4d59-88ed-0da631963402",
      "lastModifiedDateTime": "2025-02-14T06:39:05.440059135Z",
      "name": "Alice Hansen",
      "owner": {
        "user": {
          "displayName": "",
          "id": "c3326e64-7aa8-4d59-88ed-0da631963402"
        }
      },
      "root": {
        "deleted": {
          "state": "trashed"
        },
        "eTag": "\"ee10f95ca197e98906aad303ac00d5b9\"",
        "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$c3326e64-7aa8-4d59-88ed-0da631963402",
        "webDavUrl": "https://ocis-server:9200/dav/spaces/fe49f560-62f3-41fd-8364-fbcfee25e7af$c3326e64-7aa8-4d59-88ed-0da631963402"
      },
    },
    {
      "driveAlias": "personal/alice",
      "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$3af10eab-f43d-42f3-92f0-d6b2d30262c5",
      "lastModifiedDateTime": "2025-02-14T06:40:04.180433623Z",
      "name": "Alice Hansen",
      "owner": {
        "user": {
          "displayName": "",
          "id": "3af10eab-f43d-42f3-92f0-d6b2d30262c5"
        }
      },
      "root": {
        "eTag": "\"75106885b3161a9c779c714f97d60bc4\"",
        "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$3af10eab-f43d-42f3-92f0-d6b2d30262c5",
        "webDavUrl": "https://ocis-server:9200/dav/spaces/fe49f560-62f3-41fd-8364-fbcfee25e7af$3af10eab-f43d-42f3-92f0-d6b2d30262c5"
      },
    },
    {
      "driveAlias": "personal/admin",
      "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$447d3d5b-0919-4496-834a-53427cd8890a",
      "lastModifiedDateTime": "2025-02-14T06:33:46.891064395Z",
      "name": "Admin",
      "owner": {
        "user": {
          "displayName": "",
          "id": "447d3d5b-0919-4496-834a-53427cd8890a"
        }
      }
    }
  ]
}

storage/users/spaces/**:

# admin
-0919-4496-834a-53427cd8890a
-0919-4496-834a-53427cd8890a.mlock
-0919-4496-834a-53427cd8890a.mpk

# new alice
-f43d-42f3-92f0-d6b2d30262c5
-f43d-42f3-92f0-d6b2d30262c5.mlock
-f43d-42f3-92f0-d6b2d30262c5.mpk

# old alice
-7aa8-4d59-88ed-0da631963402
# missing mlock file
-7aa8-4d59-88ed-0da631963402.mpk

saw-jan avatar Feb 14 '25 10:02 saw-jan

This is fixable from the test code. We are using the wrong spaceId in the quota update request. But I don't know if this (showing broken/supposed to be deleted space in the drives list) needs fix from the server side

saw-jan avatar Feb 14 '25 11:02 saw-jan

But I don't know if this (showing broken/supposed to be deleted space in the drives list) needs fix from the server side

During the test execution, sometimes the deleted user space is also shown in the /graph/v1.0/drives (See https://github.com/owncloud/ocis/issues/10603#issuecomment-2658945047) list and the supposed to be deleted space doesn't have .mlock file in the storage/users/spaces/... path. If this is a fixable issue in the server

CC @2403905

saw-jan avatar Feb 14 '25 11:02 saw-jan

@saw-jan do we know why the deletion of the user fails in the first place? I think the only chance we have is to avoid that original issue. I'm not sure how we can recover from a failed user deletion automatically.

kobergj avatar Feb 18 '25 09:02 kobergj

@saw-jan do we know why the deletion of the user fails in the first place? I think the only chance we have is to avoid that original issue. I'm not sure how we can recover from a failed user deletion automatically.

From the logs, I can say that the user deletion was successful - at least the server sent successful response

saw-jan avatar Feb 18 '25 09:02 saw-jan

{
  "value": [
    {
      "driveAlias": "personal/alice",
      "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$c3326e64-7aa8-4d59-88ed-0da631963402",
      "lastModifiedDateTime": "2025-02-14T06:39:05.440059135Z",
      "name": "Alice Hansen",
      "root": {
        "deleted": {
          "state": "trashed" 👈
        },
        "eTag": "\"ee10f95ca197e98906aad303ac00d5b9\"",
        "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$c3326e64-7aa8-4d59-88ed-0da631963402",
        "webDavUrl": "https://ocis-server:9200/dav/spaces/fe49f560-62f3-41fd-8364-fbcfee25e7af$c3326e64-7aa8-4d59-88ed-0da631963402"
      },
    },
    {
      "driveAlias": "personal/alice",
      "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$3af10eab-f43d-42f3-92f0-d6b2d30262c5",
      "lastModifiedDateTime": "2025-02-14T06:40:04.180433623Z",
      "name": "Alice Hansen",
      "root": {
        "eTag": "\"75106885b3161a9c779c714f97d60bc4\"",
        "id": "fe49f560-62f3-41fd-8364-fbcfee25e7af$3af10eab-f43d-42f3-92f0-d6b2d30262c5",
        "webDavUrl": "https://ocis-server:9200/dav/spaces/fe49f560-62f3-41fd-8364-fbcfee25e7af$3af10eab-f43d-42f3-92f0-d6b2d30262c5"
      },
    }
...
  ]
}

The old (deleted) account has this property:

"deleted": {
  "state": "trashed"
},

saw-jan avatar Feb 19 '25 11:02 saw-jan

^ @kobergj let us know if this is fixable or correct by design in the server side. We are hardening the API tests in https://github.com/owncloud/ocis/pull/11000

saw-jan avatar Feb 19 '25 11:02 saw-jan

We need to find out why the delete of the personal space fails. But the logging in this part of the code is insufficient. I created a PR that adds some logging. We should be able to tackle the error once this is merged and it reoccurs. https://github.com/owncloud/ocis/pull/11037

kobergj avatar Feb 19 '25 15:02 kobergj

We need to find out why the delete of the personal space fails. But the logging in this part of the code is insufficient. I created a PR that adds some logging. We should be able to tackle the error once this is merged and it reoccurs. #11037

OKay, thanks. We will wait for https://github.com/owncloud/ocis/pull/11037 to be merged and for test failure to reoccur before merging our test fix PR https://github.com/owncloud/ocis/pull/11000

saw-jan avatar Feb 20 '25 03:02 saw-jan

@kobergj I made a PR from your PR and ran only the problematic test scenarios. here are the logs when this issue occurs: https://drone.owncloud.com/owncloud/ocis/43669/5/3, let's hope there're some useful info

PS: I have attached the following logs in case the drone logs disappear

ocis.log request.log scenario.log

saw-jan avatar Feb 20 '25 06:02 saw-jan