aw-server-rust
aw-server-rust copied to clipboard
Syncing ..cntd
I want to get a clear idea on what is the intended function sync and incomplete functionality of aw-sync. Following #89
My take on this is,
Object:
- To Sync life logged data of intrest to other devices using a 3rd party syncing service
HLD:
- Overall Logic
- Periodically get all data of intrest ~~after a certain timestamp~~ to the 3rd party sync dir (maybe Dropbox's dir)
- Make a folder with current device identifier in sync dir
- Copy data to that folder
- maintain this sync time in db
- Recheck sync'ed time + Schedule future sync and goto(1)
- Periodically get other devices data
- Find other devices folders in sync dir and match with last sync timing data (based on current device db lastsync and other device's file modification time)
- for each new device data
- Copy other devices's data to a staging dir
- Merge other device's data from staging into current device's Database
- and goto (1)
- Periodically get all data of intrest ~~after a certain timestamp~~ to the 3rd party sync dir (maybe Dropbox's dir)
- Specifciations
- Data
- data is in form of sqlite.database reconstructed from current device's database with only data of intrest
- data of interest - Default (AFK watcher and Window watcher) or User specified buckets
- device identifier - presently hostname
- current device meaning the device on which aw-sync is running
- development and initial production - have min 2 historical copies of files when writing to them
- Config
- sync dir is User specificed or default {LocalDataDir}/activitywatch/aw-server-rust/aw-sync/sync/{host_ids}
- staging is default {LocalDataDir}/activitywatch/aw-server-rust/aw-sync/staging/{host_ids}
- User specification from config file (in future can be from webui)
- Timing
- Timeperiod for both up and downstream 10min Default or User specified
- ~~generate db data after certain timestamp : Last sync time minus 1hr~~ There can be new devices on network, hence need data from start about other devices, meaning all upstream pushes are complete data pushes i.e from the point ActivityWatch started logging data
- maintaining sync times in db table
sync(ID AUTO_INCREMENT PRIMARY, host {0 for self}, LAST SYNC Timestamp)
- Arch
- Should be scalable for future P2P decentralized-no-3rd-party model, hence sync-framework should as far as timely possible be isolated from file-copy-paste actions
- aw-sync a rust binary or a module ? a very thin cli wrapped over a module, with instant sync(even for testing) plus continous periodic-syncs functionality - but if cli how will we get the Datastore which is opened and running?
- Multi-threaded - One for each upstream and downstream and main thread to keep track of them and
- future - on exit - EXIT GRACEFULLY - close files, stop data insersion etc. etc. - required to ensure no data loss
- Data
LLD pseudocode - oversimplified:
main.rs:
import SyncConfig
main()
read_args() // sync_mode[push|pull|sync], sync_adaptor=[file] sync_adaptor_option={sync_dir, sync_staging_dir}, optional:{server_ip, port} - clap
validate_args()
set SyncConfig including adaptor_options
match args:
push:
sync_push() // one time push
pull:
sync_pull() // one time pull
sync:
sync() // continous
sync.rs:
SyncAdaptor trait to have
push()
pull()
struct SyncConfig
adaptor: file //can be p2p
adaptor_options: Vec<> // sync_dir, staging_dir
mode: SyncMode
Server_IP:
Port:
buckets:
struct SyncMode
PUSH, PULL, BOTH
fn sync_push()
// based on setting -sync_adaptor file
match adaptor
file_push()
fn sync_pull()
// based on setting -sync_adaptor file
match adaptor
file_pull()
fn sync()
set_up_sync() // Select adaptor and push adaptor specifc config data (here sync_dir, staging_dir)
start_push_Thread()
start_pull_Thread()
fn poll_for_sync(SyncMode)
presentTime > nextSyncTime
fn force_push()
// return true to force_push before nextSyncTime from thread
fn force_pull()
// return true to force_pull before nextSyncTime from thread
push_Thread_main()
poll_for_sync || force_push()
sync_push()
else
sleep(poll_time=1000s)
pull_Thread_main()
poll_for_sync || force_pull()
sync_pull()
else
sleep(poll_time=1000s)
file.rs:
file implements SyncAdaptor
fn file_pull()
sys_io()
find_other_device_data_folders()
for each other device data()
if _should_pull_file(otherdevice_dataFile)
copy_to_staging()
work_on_staging()
aw_io()
odDataStore_buckets = other_devise_open_data()::get_buckets() // sqlite.db DataStore:: API
self_DataStore = current_device_ds::open()
setup_bucket_self_dataStore() // if not exists create
for each bucket:
odDataStore_buckets.getEvents()
self_DataStore.buckets.InsertEvents()
fn file_push()
sys_io()
copy_current_device_db_to_staging() // to avoid db manupulation while sync-ing - but if its open presently how to handle ?
// make a DataStore BUSY flag to pause Datastore manupulations and then copy the underlying sqlite file ?
aw_io()
self_DataStore_buckets = current_device_ds::open() // which is copied to staging folder
staging_db::datastore = setup_current_device_db() // open new sqlite staging db for current device to created db with "filtered" buckets
for each bucket of Intrest from SyncConfig::buckets in self_Datastore_buckets
staging_db::CreateBucket
self_DataStore_buckets::getEvents()
staging_db::DS.insertEvents()
fn _should_pull_file(otherDevice_datafile)
if modified_date_file() > last_modified_date()
true
How far am i with in-sync with the actual syncing intention ?
Any inputs @ErikBjare ? :3
Yeah this is very similar in principle! (if not exactly the same)
I should add that the current sync is actually working, it's just a bit rough around the edges (some bugs, not very stable) but I use it to sync data between 4 devices. Check out the aw-sync README and scripts in that folder.
Actually, I kinda want to use your high-level description as the foundation for docs describing how it works (since it's so on point).
Could you maybe clean it up a bit and we'll get it merged? Maybe put it in the aw-sync README for now? :)
Yeah this is very similar in principle! (if not exactly the same)
I should add that the current sync is actually working, it's just a bit rough around the edges (some bugs, not very stable) but I use it to sync data between 4 devices. Check out the aw-sync README and scripts in that folder.
First and most importantly, I don’t want to replace or replicate any work already done for Syncing. I have personal use case for syncing data on my Android device and desktop pc. The reason I made this Issue to know exactly(detail to the level of rust functions) how much progress has been done, what exact issues are pending, and what exactly needs to be done to move "In-Progress" to "Done 🎉" on the kanban project board.
I didn’t understand the #89 todo that much. I did go through code and there are some comments (like "TODO:..."). I also did go through the scripts in aw-sync folder(which themselves have some cross-platform todo, and are not working for me 😭). But, will solving all those TODO comments is all the gaps that are there in Syncing feature ?
From my pseudocode, i feel all code in main.rs and file.rs is already done. My idea of syncing is that ActivityWatch will take care of syncing in background without me running some commands in terminal. I don’t want to run the commands everytime to sync 😢 I want ActivityWatch to take care of those. Also I don’t know how to run commands on my AndroidDevice too.
By way of this issue, I intend to discuss and fill the remaining gaps in Syncing feature. And i feel that if that gaps are clearly known before me starting actual coding, it will reduce hassles in merging too (wish to avoid design changes, control flow changes etc..).
Actually, I kinda want to use your high-level description as the foundation for docs describing how it works (since it's so on point).
Could you maybe clean it up a bit and we'll get it merged? Maybe put it in the aw-sync README for now? :)
Yeah sure, why not. Will make a PR after we finish the Syncing feature :D