edx-analytics-pipeline icon indicating copy to clipboard operation
edx-analytics-pipeline copied to clipboard

Collect blast information from SailThru

Open brianhw opened this issue 7 years ago • 2 comments

DE-69.

This pulls information from SailThru about email blasts. Two tables are created: statistics about each email blast -- how many were opened and clicked and such; and information about whom the blast was sent to.

The current version of the code is not incremental. Given how long blast information takes to be returned from SailThru, this would be a useful performance improvement.

Other shortcomings:

  • Only twenty blasts will be returned for a request for info about blasts in a given time period. This is done on a per-day basis, but is a serious hole.
  • Email hashes are only useful if there is an additional table to link the hashes with the actual emails. This will involve taking MD5 hashes of email addresses of all users, and putting that table up for joining. It would be more convenient to have it already joined to the sailthru blast emails table.
  • Once a job request has been submitted, I don't know how to cancel it. Since only ten jobs are run at any time, long-running jobs can clog up Sailthru's queue for a long time.

brianhw avatar Jun 19 '17 20:06 brianhw

Codecov Report

Merging #409 into master will decrease coverage by 0.94%. The diff coverage is 29.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #409      +/-   ##
==========================================
- Coverage   77.87%   76.92%   -0.95%     
==========================================
  Files         190      191       +1     
  Lines       20889    21310     +421     
==========================================
+ Hits        16267    16393     +126     
- Misses       4622     4917     +295
Impacted Files Coverage Δ
edx/analytics/tasks/common/vertica_load.py 61.48% <19.23%> (-4.07%) :arrow_down:
...asks/warehouse/load_internal_reporting_sailthru.py 30.63% <30.63%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 446f402...fc7d5a4. Read the comment docs.

codecov-io avatar Jun 19 '17 20:06 codecov-io

This is working now, in terms of being able to schedule jobs that would load Sailthru blast email and stats to Vertica. At some point in the future, this should be revisited, to determine if these are the right fields to pull, and if Vertica is the right destination.

brianhw avatar Nov 08 '17 22:11 brianhw