scylla-cluster-tests icon indicating copy to clipboard operation
scylla-cluster-tests copied to clipboard

Change Loader core-dump event to a different type than DB nodes

Open yarongilor opened this issue 3 years ago • 10 comments

Currently a loader core dump is reported like: 2022-02-02 03:17:29.563: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=ee467824-4cc4-4143-acae-4e3ccb53d30e node=Node longevity-large-partitions-200k-pks-loader-node-675ad286-1 [34.243.249.78 | 10.0.0.199] (seed: False)

(example in https://github.com/scylladb/scylla/issues/10019)

yarongilor avatar Feb 02 '22 13:02 yarongilor

@yarongilor what exactly you are asking in this issue ? You have the name of the node right there, what else do you need ?

fruch avatar Feb 02 '22 13:02 fruch

@fruch , i think it needs something to perhaps make it clearer to be different than the DB nodes core dump. so a different type can be good, perhaps even a different severity, like warning instead of error. it doesn't necessarily have to fail the test if encountered a s-b bug. please also comment if you or @roydahan have additional ideas.

yarongilor avatar Feb 02 '22 16:02 yarongilor

@fruch , i think it needs something to perhaps make it clearer to be different than the DB nodes core dump. so a different type can be good, perhaps even a different severity, like warning instead of error. it doesn't necessarily have to fail the test if encountered a s-b bug. please also comment if you or @roydahan have additional ideas.

If s-b is used as main load, it should fail the test, if it was part of a nemesis, we always fail the test on failed nemesis. So I don't see a reason to lower the severity.

I'm not sure as to how it can be clear, a new field might help ? something like node_type ?

fruch avatar Feb 03 '22 07:02 fruch

I don't want to lower the severity, it would be better if we can clearly differ the event by its name. CoreDump event of scylla vs CoreDump event of the loader. In this case, I don't know how and why, it didn't cause the load to stop or at least the test to detect it.

roydahan avatar Feb 03 '22 11:02 roydahan

I don't want to lower the severity, it would be better if we can clearly differ the event by its name. CoreDump event of scylla vs CoreDump event of the loader. In this case, I don't know how and why, it didn't cause the load to stop or at least the test to detect it.

Coredumps are not critical events we don't stop the test cause of them.

fruch avatar Feb 03 '22 15:02 fruch

This issue is stale because it has been open 2 years with no activity. Remove stale label or comment or this will be closed in 2 days.

github-actions[bot] avatar Feb 04 '24 00:02 github-actions[bot]

@roydahan you want still think we need different type of this event for different nodes types ?

fruch avatar Feb 04 '24 07:02 fruch

Maybe not a different type, depends how it looks on Argus now. Can we easily differentiate between loaders and db nodes?

roydahan avatar Feb 04 '24 10:02 roydahan

Maybe not a different type, depends how it looks on Argus now. Can we easily differentiate between loaders and db nodes?

it's part of node name as you see in this issue description, as it always was. each event comes with the node it's relevant to

how many time you've seen someone open scylla issue with coredump from loader ? (I'm not sure add extra information would help with such case)

if we are talking about logic and automation on SCT or Arugs end, it might be helpful to be able to programmatically distinguish it, for example if mention it in the email template of not (currently we don't)

fruch avatar Feb 04 '24 11:02 fruch

This issue was opened because people reported invalid issues on core dumps they got on loaders and didn’t notice that.

In any case, it’s a low priority even if we don’t close it. Or we can close it and handle it in Argus if we see it again and people complaining about it.

roydahan avatar Feb 04 '24 11:02 roydahan