gcp-ingestion icon indicating copy to clipboard operation
gcp-ingestion copied to clipboard

Parse channel from activity-stream pings

Open jklukas opened this issue 5 years ago • 3 comments

Impression-stats and other docTypes have a release top-level field that should be used in the pipeline as input to normalized_channel. Currently, they have null normalized_channel.

jklukas avatar Jan 10 '20 20:01 jklukas

We should probably encode this in JSON schemas under mozPipelineMetadata as a new normalized_channel_source field and have the pipeline use that to decide where to look.

jklukas avatar Mar 03 '21 14:03 jklukas

We also have cases where we want to use a static value for channel. For Fenix, we codify the value for app_channel in https://github.com/mozilla/probe-scraper/blob/main/repositories.yaml

We could represent that in the generated JSON schemas as a static value.

So perhaps we should have mozPipelineMetadata like the following:

"static_fields": {
  "attribute": "normalized_channel",
  "static_value": "release"
}
"fields_from_payload": {
  "attribute": "normalized_channel"
  "source_path": "#/channel"
}

That would make this more generally applicable compared to supporting just normalized_channel. We'd have to think carefully about the interface and what to call the fields.

jklukas avatar May 24 '21 17:05 jklukas

Thinking more about interface, this could be cast as attribute_mappings similar to the existing jwe_mappings. Each mapping would have a required attribute field and then either a static_value or source_path field.

For a value like normalized_channel, though, this isn't quite powerful enough. The source_path would generally be pointing to a "raw" channel identifier; the value still needs to go through the NormalizeAttributes#channel logic. So I suppose we'd be populating attribute app_update_channel via this metadata, and we still rely on the pipeline knowing about this as the attribute to use as source for normalization.

jklukas avatar May 25 '21 12:05 jklukas