contest icon indicating copy to clipboard operation
contest copied to clipboard

Switch to using a single event type

Open mimir-d opened this issue 4 years ago • 2 comments

Issue by marcoguerri Wednesday Jun 17, 2020 at 09:09 GMT Originally opened as https://github.com/facebookincubator/contest/issues/123


At the moment we have two type of events, testevents and frameworkevents. The former are meant to be used when emitted by a Step, the latter from anywhere in the framework outside of a Step. The intention was to enforce non-null on most of the fields of the Step.

Eventually we settled on the fact that this is too restrictive. FrameworkEvents are barely used, and lots of places of the framework need to emit events that include a subset of the information of testevents.

So, we agree on deprecating frameworkevents, and using one single concept of event, where fields might be null. In addition, this event should be associated to a "source" in the header (most likely split into SourceType and SourceName), which will indicate who emitted the event.

mimir-d avatar Oct 28 '21 23:10 mimir-d

Comment by marcoguerri Friday Jun 19, 2020 at 16:57 GMT


I have given some thought to this, so dropping here some initial notes.

The current state of the database is the following (excluding the Target part, which is not yet represented):

Schema

Some considerations before discussing the translation into relational schema:

  • Run, Test, Step are chained weak entities. Therefore the approach I have taken in the current version of the schema is to collapse all those entities together and inherit the key of the first strong entity in the chain (job). The result is job_id, run_id, test_name, test_step_label. The rationale behind testevents was that they are also a weak entity, so they would acquire this composite key. test_events at the moment is the following:
CREATE TABLE test_events (
        event_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,

        job_id BIGINT(20) NOT NULL,
        run_id BIGINT(20) NOT NULL,
        test_name VARCHAR(32) NULL,
        test_step_label VARCHAR(32) NULL,

        event_name VARCHAR(32) NULL,
        target_name VARCHAR(64) NULL,
        target_id VARCHAR(64) NULL,
        payload TEXT NULL,
        emit_time TIMESTAMP NOT NULL,
        PRIMARY KEY (event_id)
);

The requirements we need to capture with this rework I think are the following:

  • We need to normalize the relationship between events and targets. I think this is particularly important the moment we want to start building analytics and extend the target information that we want to store
  • We need to identify the source of the emission. This could be as simple as adding two attributes in the Events table: EmitterType, EmitterName.

I would maintain the collapsing of the relationship with Job, Run, Test, Step into the Events table. I have tried to come up with an alternative approach that models Job, Run, Test, Step as a hierarchy and normalize Events by associating only an EmitterID, which might have a parent emitter. But the result is much more complicated and would either require (at a performance cost):

  1. A view built based on multiple joins
  2. Switch to ORM and hide the complexity of the hierarchy

mimir-d avatar Oct 28 '21 23:10 mimir-d

Comment by dimkup Thursday Aug 27, 2020 at 10:31 GMT


does it worth to include value of reflect.TypeOf(payload).String() as a field to the event table? It would allow to query events, with payload of certain type and have sort of guarantee that the payload can be deserialised to the structure..

mimir-d avatar Oct 28 '21 23:10 mimir-d