CodeforcesImporter
CodeforcesImporter copied to clipboard
Some thoughts on persistence and sync..
hey @dragonslayerx currently we use cfiignore to keep track of submissions that have been already fetched from the CF server. I got an alternative approach which seems efficient (correct me if i'm wrong) and re-usable for generating visualizations easily! We can store submissions in an sqlite file! this sqlite file can act as the table from which we can query to generate the classification html and/or visualization html. for classification data something like "select * from local_sqlite_table group by problem_type". for visualization data (say date vs submission count) "select date as submission_date, count(*) as submission_count from local_sqlite_table group by date"
Advantages of this approach:
- Easy to sync
- efficient than writing to a file ourself (since sqlite takes care of this)
- efficient for generating info for visualization (think of the ease of sql "group by" clause vs manipulating list of submissions ourself to generate it programmatically !!!) - sqlite is built for that and we can leverage it!
- cleaner code when we need to generate lot of graphs in visualization (that goes without saying)
Approach for sync: we can maintain something called 'last-sync-time' . every time we sync, we can fetch all submissions whose date > last-sync-time
Actually I want to make everything easy. A person would not like to go through installing db for just a small script. Regarding efficiency, person have approx. 300 on average submissions on CF and the submission growth rate is small. The script has to be used infrequently.
I agree processing submissions is code intensive task and can be handled with a simple db query. But seem to have no other way to do this task.
@dragonslayerx if the overhead of installing db is the concern here, I would like to mention that "sqlite" is an embeddable db engine which uses a "file" as a db.. In simple terms using sqlite in a program is no more than reading and writing to a file.. There is native support for py https://docs.python.org/2/library/sqlite3.html
This library does all the heavy lifting for us like
- creating the file which acts as the db
- giving an awesome api to interact with the file using sql
There would be no intervention/dependency for the enduser to install a db for the script to run. He wouldn't know that we are persisting on sqlite unless he sees the sourcecode. Thanks to sqlite ;)
Nice :) I wasnt aware of sqllite doesnt require explicit install. Will read its documentation and will try to migrate it to sqllite. Thanks :)
:+1: