scylla-cluster-tests
scylla-cluster-tests copied to clipboard
Change Loader core-dump event to a different type than DB nodes
Currently a loader core dump is reported like:
2022-02-02 03:17:29.563: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=ee467824-4cc4-4143-acae-4e3ccb53d30e node=Node longevity-large-partitions-200k-pks-loader-node-675ad286-1 [34.243.249.78 | 10.0.0.199] (seed: False)
(example in https://github.com/scylladb/scylla/issues/10019)
@yarongilor what exactly you are asking in this issue ? You have the name of the node right there, what else do you need ?
@fruch , i think it needs something to perhaps make it clearer to be different than the DB nodes core dump. so a different type can be good, perhaps even a different severity, like warning
instead of error
. it doesn't necessarily have to fail the test if encountered a s-b bug. please also comment if you or @roydahan have additional ideas.
@fruch , i think it needs something to perhaps make it clearer to be different than the DB nodes core dump. so a different type can be good, perhaps even a different severity, like
warning
instead oferror
. it doesn't necessarily have to fail the test if encountered a s-b bug. please also comment if you or @roydahan have additional ideas.
If s-b is used as main load, it should fail the test, if it was part of a nemesis, we always fail the test on failed nemesis. So I don't see a reason to lower the severity.
I'm not sure as to how it can be clear, a new field might help ? something like node_type ?
I don't want to lower the severity, it would be better if we can clearly differ the event by its name. CoreDump event of scylla vs CoreDump event of the loader. In this case, I don't know how and why, it didn't cause the load to stop or at least the test to detect it.
I don't want to lower the severity, it would be better if we can clearly differ the event by its name. CoreDump event of scylla vs CoreDump event of the loader. In this case, I don't know how and why, it didn't cause the load to stop or at least the test to detect it.
Coredumps are not critical events we don't stop the test cause of them.
This issue is stale because it has been open 2 years with no activity. Remove stale label or comment or this will be closed in 2 days.
@roydahan you want still think we need different type of this event for different nodes types ?
Maybe not a different type, depends how it looks on Argus now. Can we easily differentiate between loaders and db nodes?
Maybe not a different type, depends how it looks on Argus now. Can we easily differentiate between loaders and db nodes?
it's part of node name as you see in this issue description, as it always was. each event comes with the node it's relevant to
how many time you've seen someone open scylla issue with coredump from loader ? (I'm not sure add extra information would help with such case)
if we are talking about logic and automation on SCT or Arugs end, it might be helpful to be able to programmatically distinguish it, for example if mention it in the email template of not (currently we don't)
This issue was opened because people reported invalid issues on core dumps they got on loaders and didn’t notice that.
In any case, it’s a low priority even if we don’t close it. Or we can close it and handle it in Argus if we see it again and people complaining about it.