D.C. Hess comments

Results 10 comments of


                                            D.C. Hess

Archive StudentSubmissions before refresh

@zkagin Can we prioritize this piece of logic before moving forward more on the syncing logic?

Archive StudentSubmissions before refresh

@zkagin Yeah. I think we'd want to maintain a list of data that has previously come through so the truncation of the main table doesn't result in data loss. @denglender...

Archive StudentSubmissions before refresh

@zkagin Thinking about this more, I'm wondering if the deleted column is necessary? Shouldn't IDs be unique? Couldn't we append then remove duplicate IDs preserving the matching ID with the...

Archive StudentSubmissions before refresh

@zkagin That makes sense to me (minimizing db writes) but I'm also concerned about memory limitations by keeping the diffing in local memory. I've been hitting some memory thresholds on...

Archive StudentSubmissions before refresh

@zkagin I'm not sure its a delete in all cases. In some it's more about preserving previously pulled data (ie when the start date changes). Memory has been an issue...

Archive StudentSubmissions before refresh

@zkagin check the docs for sqlsorcery I have some examples of how to do updates/deletes by dipping into sqlalchemy functions.

Archive StudentSubmissions before refresh

@zkagin Updates: https://sqlsorcery.readthedocs.io/en/latest/cookbook/etl.html#update-table-values Deletes: https://sqlsorcery.readthedocs.io/en/latest/cookbook/etl.html#delete-specific-records

Archive StudentSubmissions before refresh

@zkagin One approach would be to: - append new records to the table - query all records with with duplicate IDs and their updateTime - delete records with the MIN...

Query StudentSubmissions by coursework

@zkagin I think we can accomplish this simply by creating a new method for Courses: ```python def get_recent_course_ids(self): try: coursework = pd.read_sql_table("GoogleClassroom_CourseWork", con=self.sql.engine, schema=self.sql.schema) courses = coursework.loc[coursework.creationTime >= self.config.SCHOOL_YEAR_START] return...

Query StudentSubmissions by coursework

Actually, we may want to use a different env var for this date.