ol-infrastructure
ol-infrastructure copied to clipboard
Add Fastly logs as a catalog in Trino
User Story
- As a product owner I would like to be able to build reports about user behavior with a property that is managed via Fastly
Description/Context
We have started to collect the Fastly logs as JSON files in S3. We would like to standardize the JSON schema generated by Fastly logs and expose them as a table definition in our Trino infrastructure. This will allow us to use this user traffic to enrich data and reports that we build for products that use Fastly for front-end caching.
Acceptance Criteria
- [ ] Fastly properties all log as JSON to S3 with a standard message schema
- [ ] Fastly logs are cataloged as a table in a Trino cluster
- [ ] Fastly logs are available for processing via dbt to generate derived data assets/reports
We can do this in a fairly straightforward manner by piping the log data through Airbyte to expose it as a table. The main work to be done as a pre-requisite is to thoroughly define the log schema so that it is consistent and includes all of the information that we would like to be able to report on.
-
production fastly / edxapp deployments outstanding:
- [ ] mitx
- [ ] mitx-staging
- [X] mitxonline
- [ ] xpro
-
In Airbyte
- [ ] Blocker: Airbyte s3 connector times out when trying to connect to buckets with lots of objects. Allegedly we can setup the connection via DB entries. Didn't make much progress finding those when I looked at one point.
- [ ] Finish setting up connections in QA airbyte (mitx-staging doesn't seem to work)
- [ ] Production
- [ ] mitx
- [ ] mitx-staging
- [ ] mitxonline
- [ ] xpro
-
In Starburst
- [ ] Ensure that the tables/schemas are populated
- [ ] QA
- [ ] mitx
- [ ] mitx-staging
- [ ] mitxonline
- [ ] xpro
- [ ] Production
- [ ] mitx
- [ ] mitx-staging
- [ ] mitxonline
- [ ] xpro