edx-analytics-pipeline
edx-analytics-pipeline copied to clipboard
Collect blast information from SailThru
DE-69.
This pulls information from SailThru about email blasts. Two tables are created: statistics about each email blast -- how many were opened and clicked and such; and information about whom the blast was sent to.
The current version of the code is not incremental. Given how long blast information takes to be returned from SailThru, this would be a useful performance improvement.
Other shortcomings:
- Only twenty blasts will be returned for a request for info about blasts in a given time period. This is done on a per-day basis, but is a serious hole.
- Email hashes are only useful if there is an additional table to link the hashes with the actual emails. This will involve taking MD5 hashes of email addresses of all users, and putting that table up for joining. It would be more convenient to have it already joined to the sailthru blast emails table.
- Once a job request has been submitted, I don't know how to cancel it. Since only ten jobs are run at any time, long-running jobs can clog up Sailthru's queue for a long time.
Codecov Report
Merging #409 into master will decrease coverage by
0.94%
. The diff coverage is29.92%
.
@@ Coverage Diff @@
## master #409 +/- ##
==========================================
- Coverage 77.87% 76.92% -0.95%
==========================================
Files 190 191 +1
Lines 20889 21310 +421
==========================================
+ Hits 16267 16393 +126
- Misses 4622 4917 +295
Impacted Files | Coverage Δ | |
---|---|---|
edx/analytics/tasks/common/vertica_load.py | 61.48% <19.23%> (-4.07%) |
:arrow_down: |
...asks/warehouse/load_internal_reporting_sailthru.py | 30.63% <30.63%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 446f402...fc7d5a4. Read the comment docs.
This is working now, in terms of being able to schedule jobs that would load Sailthru blast email and stats to Vertica. At some point in the future, this should be revisited, to determine if these are the right fields to pull, and if Vertica is the right destination.