parseable icon indicating copy to clipboard operation
parseable copied to clipboard

called `Option::unwrap()` on a `None` value since 1.5.0

Open ihiverlet opened this issue 1 year ago • 14 comments

Hi, Since upgrading to 1.5.0, we get the following error message:

parseable logs : called `Option::unwrap()` on a `None` value
thread 'actix-rt|system:0|arbiter:2' panicked at server/src/metadata.rs:368:52:
called `Option::unwrap()` on a `None` value
thread 'actix-rt|system:0|arbiter:3' panicked at server/src/metadata.rs:368:52:

It seems to be related to https://github.com/parseablehq/parseable/pull/892. It was working fine in v1.4.0. Here is an extract of the fluent-bit config we're using

    [OUTPUT]
         Name http
         Match ingress.*
         host parseable.parseable.svc.cluster.local
         http_User user
         http_Passwd password
         format json
         compress gzip
         port 80
         header Content-Type application/json
         header X-P-Stream ingress
         uri /api/v1/ingest
         json_date_key timestamp
         json_date_format iso8601

ihiverlet avatar Sep 17 '24 12:09 ihiverlet

Thanks for reporting @ihiverlet we'll fix this asap.

nitisht avatar Sep 17 '24 12:09 nitisht

Hey @ihiverlet Could you please provide us with some more information on-

  • server version
  • commit id
  • server mode

You'll find these three things as part of the initial banner when you start the parseable process. It should look something like this- image

Also, did you migrate both the Ingest and the Query servers from v1.4.0 to v1.5.0?

parmesant avatar Sep 18 '24 06:09 parmesant

Hello,

        Server Mode:        "Standalone"
        Version:            "v1.5.0"
        Commit:             "091377b"

Regarding your last question, I use the helm chart with the following configuration

parseable:
  parseable:
    local: false
    env:
      P_S3_TLS_SKIP_VERIFY: "true"
      P_PARQUET_COMPRESSION_ALGO: snappy
      P_OIDC_ISSUER: issuer
      P_OIDC_CLIENT_ID:  id
      P_OIDC_CLIENT_SECRET:  secret
      P_ORIGIN_URI: uri
    resources:
      limits:
        cpu: 4
        memory: 20Gi
      requests:
        cpu: 1
        memory: 4Gi

ihiverlet avatar Sep 18 '24 09:09 ihiverlet

Hey @ihiverlet The issue is due to the sequence in which the ingest and query nodes got upgraded.

Our suggestion is that you always upgrade query first and then upgrade ingest.

Thanks for bringing this issue forward, we will make sure to include more meaningful error messages wherever possible.

parmesant avatar Sep 20 '24 06:09 parmesant

Hey @parmesant,

I am using the helm chart with basic configuration, so I am in standalone mode. Hence, I do not understand what you mean by upgrading query node first and ingest later.

ihiverlet avatar Sep 20 '24 09:09 ihiverlet

I was able to reproduce this error in distributed mode but not in standalone mode. Could you please tell me how to reproduce this error? If you wish to, you could join our slack and carry the conversation over there.

parmesant avatar Sep 20 '24 10:09 parmesant

Facing the same issue Upgrade from 0.9.0 -> 1.5.3 in standalone mode on ECS, reverting to 0.9.0 also causes an error. Is there a specific version upgrade process we should do? 0.9.0 -> 1.0.0 -> ...

smparekh avatar Oct 02 '24 17:10 smparekh

Reverting to v0.9.0 results in:

Error: Could not start the server because bucket 'https://s3.xxx.amazonaws.com/xxx' contains stale data, please use an empty bucket and restart the server.

smparekh avatar Oct 02 '24 17:10 smparekh

hi @smparekh,

Reverting to v0.9.0 results in:

Error: Could not start the server because bucket 'https://s3.xxx.amazonaws.com/xxx' contains stale data, please use an empty bucket and restart the server.

Reverting will not work for sure. There are metadata migrations happening between these versions. We found out in case of @ihiverlet there was active data being ingested while upgrade was happening - is this the case here too?

nitisht avatar Oct 03 '24 03:10 nitisht

That may have been the case, I use ecs to deploy, so the old task isn’t fully shutdown until the new task is in place. For future upgrades we will make sure the old task is shutdown before the new task is spun up. Would that be sufficient?

smparekh avatar Oct 03 '24 14:10 smparekh

For future upgrades we will make sure the old task is shutdown before the new task is spun up. Would that be sufficient?

Yes, we'll also ensure the server doesn't accept events while performing migrations. This will be added in upcoming releases. Meanwhile are you able to use Parseable right now?

nitisht avatar Oct 03 '24 16:10 nitisht

@smparekh would it be possible for you to have a call today, we can sort the issue quickly on the call

nikhilsinhaparseable avatar Oct 04 '24 02:10 nikhilsinhaparseable

For future upgrades we will make sure the old task is shutdown before the new task is spun up. Would that be sufficient?

Yes, we'll also ensure the server doesn't accept events while performing migrations. This will be added in upcoming releases. Meanwhile are you able to use Parseable right now?

We tested the update on a test environment, unfortunately weren't able to recover the data after the downgrade, we went ahead and deleted the bucket so we could revert to v0.9.0.

smparekh avatar Oct 04 '24 14:10 smparekh

In that case let's have a short call @smparekh to ensure data is properly recovered and you're able to run the latest version. Would you please schedule something here: https://logg.ing/quick-chat

nitisht avatar Oct 05 '24 03:10 nitisht