Switch to using a single event type
Issue by marcoguerri
Wednesday Jun 17, 2020 at 09:09 GMT
Originally opened as https://github.com/facebookincubator/contest/issues/123
At the moment we have two type of events, testevents and frameworkevents. The former are meant to be used when emitted by a Step, the latter from anywhere in the framework outside of a Step. The intention was to enforce non-null on most of the fields of the Step.
Eventually we settled on the fact that this is too restrictive. FrameworkEvents are barely used, and lots of places of the framework need to emit events that include a subset of the information of testevents.
So, we agree on deprecating frameworkevents, and using one single concept of event, where fields might be null. In addition, this event should be associated to a "source" in the header (most likely split into SourceType and SourceName), which will indicate who emitted the event.
Comment by marcoguerri
Friday Jun 19, 2020 at 16:57 GMT
I have given some thought to this, so dropping here some initial notes.
The current state of the database is the following (excluding the Target part, which is not yet represented):

Some considerations before discussing the translation into relational schema:
Run,Test,Stepare chained weak entities. Therefore the approach I have taken in the current version of the schema is to collapse all those entities together and inherit the key of the first strong entity in the chain (job). The result isjob_id, run_id, test_name, test_step_label. The rationale behindtesteventswas that they are also a weak entity, so they would acquire this composite key.test_eventsat the moment is the following:
CREATE TABLE test_events (
event_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
job_id BIGINT(20) NOT NULL,
run_id BIGINT(20) NOT NULL,
test_name VARCHAR(32) NULL,
test_step_label VARCHAR(32) NULL,
event_name VARCHAR(32) NULL,
target_name VARCHAR(64) NULL,
target_id VARCHAR(64) NULL,
payload TEXT NULL,
emit_time TIMESTAMP NOT NULL,
PRIMARY KEY (event_id)
);
The requirements we need to capture with this rework I think are the following:
- We need to normalize the relationship between events and targets. I think this is particularly important the moment we want to start building analytics and extend the target information that we want to store
- We need to identify the source of the emission. This could be as simple as adding two attributes in the
Eventstable:EmitterType,EmitterName.
I would maintain the collapsing of the relationship with Job, Run, Test, Step into the Events table. I have tried to come up with an alternative approach that models Job, Run, Test, Step as a hierarchy and normalize Events by associating only an EmitterID, which might have a parent emitter. But the result is much more complicated and would either require (at a performance cost):
- A view built based on multiple joins
- Switch to ORM and hide the complexity of the hierarchy
Comment by dimkup
Thursday Aug 27, 2020 at 10:31 GMT
does it worth to include value of reflect.TypeOf(payload).String() as a field to the event table? It would allow to query events, with payload of certain type and have sort of guarantee that the payload can be deserialised to the structure..