aw-server-rust icon indicating copy to clipboard operation
aw-server-rust copied to clipboard

Syncing ..cntd

Open ShootingKing-AM opened this issue 3 years ago • 4 comments

I want to get a clear idea on what is the intended function sync and incomplete functionality of aw-sync. Following #89

My take on this is,

Object:

  • To Sync life logged data of intrest to other devices using a 3rd party syncing service

HLD:

  • Overall Logic
    1. Periodically get all data of intrest ~~after a certain timestamp~~ to the 3rd party sync dir (maybe Dropbox's dir)
      1. Make a folder with current device identifier in sync dir
      2. Copy data to that folder
      3. maintain this sync time in db
      4. Recheck sync'ed time + Schedule future sync and goto(1)
    2. Periodically get other devices data
      1. Find other devices folders in sync dir and match with last sync timing data (based on current device db lastsync and other device's file modification time)
      2. for each new device data
        1. Copy other devices's data to a staging dir
        2. Merge other device's data from staging into current device's Database
      3. and goto (1)
  • Specifciations
    • Data
      • data is in form of sqlite.database reconstructed from current device's database with only data of intrest
      • data of interest - Default (AFK watcher and Window watcher) or User specified buckets
      • device identifier - presently hostname
      • current device meaning the device on which aw-sync is running
      • development and initial production - have min 2 historical copies of files when writing to them
    • Config
      • sync dir is User specificed or default {LocalDataDir}/activitywatch/aw-server-rust/aw-sync/sync/{host_ids}
      • staging is default {LocalDataDir}/activitywatch/aw-server-rust/aw-sync/staging/{host_ids}
      • User specification from config file (in future can be from webui)
    • Timing
      • Timeperiod for both up and downstream 10min Default or User specified
      • ~~generate db data after certain timestamp : Last sync time minus 1hr~~ There can be new devices on network, hence need data from start about other devices, meaning all upstream pushes are complete data pushes i.e from the point ActivityWatch started logging data
      • maintaining sync times in db table sync (ID AUTO_INCREMENT PRIMARY, host {0 for self}, LAST SYNC Timestamp)
    • Arch
      • Should be scalable for future P2P decentralized-no-3rd-party model, hence sync-framework should as far as timely possible be isolated from file-copy-paste actions
      • aw-sync a rust binary or a module ? a very thin cli wrapped over a module, with instant sync(even for testing) plus continous periodic-syncs functionality - but if cli how will we get the Datastore which is opened and running?
      • Multi-threaded - One for each upstream and downstream and main thread to keep track of them and
      • future - on exit - EXIT GRACEFULLY - close files, stop data insersion etc. etc. - required to ensure no data loss

LLD pseudocode - oversimplified:

main.rs:

    import SyncConfig

    main()
        read_args() // sync_mode[push|pull|sync], sync_adaptor=[file] sync_adaptor_option={sync_dir, sync_staging_dir}, optional:{server_ip, port} - clap
        validate_args()
        set SyncConfig including adaptor_options
        
        match args:
            push:
                sync_push() // one time push
            pull:
                sync_pull() // one time pull
            sync:
                sync() // continous
sync.rs:

    SyncAdaptor trait to have
        push()
        pull()

    struct SyncConfig
        adaptor: file //can be p2p
        adaptor_options: Vec<> // sync_dir,  staging_dir
        mode: SyncMode
        Server_IP:
        Port:
        buckets:

    struct SyncMode
        PUSH, PULL, BOTH

    fn sync_push()
        // based on setting -sync_adaptor file
        match adaptor
            file_push()

    fn sync_pull()
        // based on setting -sync_adaptor file
        match adaptor
            file_pull()

    fn sync()
        set_up_sync() // Select adaptor and push adaptor specifc config data (here sync_dir, staging_dir)
        start_push_Thread()
        start_pull_Thread()

    fn poll_for_sync(SyncMode)
        presentTime > nextSyncTime

    fn force_push()
        // return true to force_push before nextSyncTime from thread

    fn force_pull()
        // return true to force_pull before nextSyncTime from thread

    push_Thread_main()
        poll_for_sync || force_push()
            sync_push()
        else
            sleep(poll_time=1000s)
    
    pull_Thread_main()
        poll_for_sync || force_pull()
            sync_pull()
        else
            sleep(poll_time=1000s)

file.rs:
    file implements SyncAdaptor
        fn file_pull()
            sys_io()
                find_other_device_data_folders()
                for each other device data()
                    if _should_pull_file(otherdevice_dataFile)
                        copy_to_staging()
                        work_on_staging()
                            aw_io()
                                odDataStore_buckets = other_devise_open_data()::get_buckets() // sqlite.db DataStore:: API
                                self_DataStore = current_device_ds::open()
                                setup_bucket_self_dataStore() // if not exists create 
                                for each bucket:
                                    odDataStore_buckets.getEvents()
                                    self_DataStore.buckets.InsertEvents()

        fn file_push()
            sys_io()
                copy_current_device_db_to_staging() // to avoid db manupulation while sync-ing - but if its open presently how to handle ? 
                // make a DataStore BUSY flag to pause Datastore manupulations and then copy the underlying sqlite file ?
                aw_io()
                    self_DataStore_buckets = current_device_ds::open() // which is copied to staging folder
                    staging_db::datastore = setup_current_device_db() // open new sqlite staging db for current device to created db with "filtered" buckets
                    for each bucket of Intrest from SyncConfig::buckets in self_Datastore_buckets
                        staging_db::CreateBucket
                        self_DataStore_buckets::getEvents()
                        staging_db::DS.insertEvents()

        fn _should_pull_file(otherDevice_datafile)
            if modified_date_file() > last_modified_date()
                true

How far am i with in-sync with the actual syncing intention ?

ShootingKing-AM avatar Oct 13 '22 19:10 ShootingKing-AM

Any inputs @ErikBjare ? :3

ShootingKing-AM avatar Oct 14 '22 12:10 ShootingKing-AM

Yeah this is very similar in principle! (if not exactly the same)

I should add that the current sync is actually working, it's just a bit rough around the edges (some bugs, not very stable) but I use it to sync data between 4 devices. Check out the aw-sync README and scripts in that folder.

ErikBjare avatar Oct 14 '22 15:10 ErikBjare

Actually, I kinda want to use your high-level description as the foundation for docs describing how it works (since it's so on point).

Could you maybe clean it up a bit and we'll get it merged? Maybe put it in the aw-sync README for now? :)

ErikBjare avatar Oct 14 '22 17:10 ErikBjare

Yeah this is very similar in principle! (if not exactly the same)

I should add that the current sync is actually working, it's just a bit rough around the edges (some bugs, not very stable) but I use it to sync data between 4 devices. Check out the aw-sync README and scripts in that folder.

First and most importantly, I don’t want to replace or replicate any work already done for Syncing. I have personal use case for syncing data on my Android device and desktop pc. The reason I made this Issue to know exactly(detail to the level of rust functions) how much progress has been done, what exact issues are pending, and what exactly needs to be done to move "In-Progress" to "Done 🎉" on the kanban project board.

I didn’t understand the #89 todo that much. I did go through code and there are some comments (like "TODO:..."). I also did go through the scripts in aw-sync folder(which themselves have some cross-platform todo, and are not working for me 😭). But, will solving all those TODO comments is all the gaps that are there in Syncing feature ?

From my pseudocode, i feel all code in main.rs and file.rs is already done. My idea of syncing is that ActivityWatch will take care of syncing in background without me running some commands in terminal. I don’t want to run the commands everytime to sync 😢 I want ActivityWatch to take care of those. Also I don’t know how to run commands on my AndroidDevice too.

By way of this issue, I intend to discuss and fill the remaining gaps in Syncing feature. And i feel that if that gaps are clearly known before me starting actual coding, it will reduce hassles in merging too (wish to avoid design changes, control flow changes etc..).

Actually, I kinda want to use your high-level description as the foundation for docs describing how it works (since it's so on point).

Could you maybe clean it up a bit and we'll get it merged? Maybe put it in the aw-sync README for now? :)

Yeah sure, why not. Will make a PR after we finish the Syncing feature :D

ShootingKing-AM avatar Oct 15 '22 12:10 ShootingKing-AM