activitywatch icon indicating copy to clipboard operation
activitywatch copied to clipboard

Syncing

Open ErikBjare opened this issue 7 years ago • 86 comments

Vote on this issue on the forum!


There are two usage issues with ActivityWatch at the moment to which syncing is a solution:

  • If you use more than one device, you need to check every device individually, or run one centralized instance of aw-server (not recommended!)
  • If a machine is lost, so is the data (the user could have exported it, but data stored after the export would still be lost). While ActivityWatch cannot replace a proper backup system, syncing could help by storing copies of the data across devices.

I know of two interesting solutions to this problem:

  • Centralized server which stores all data encrypted (the server is unable to decrypt)
    • Issues: Centralized, single point of failure
    • Done by @StandardNotes
  • P2P synchronization (encrypted, possibly including relays)
    • Done by @Syncthing very well, perhaps we could use it in some way. Also: MPL2 licensed and written in Go.
      • Downside: Clients must be online at the same time for sync.
      • They have the ability to set some folders to "read only", useful when you want to ensure the data stays intact in its source.
    • Implementing it ourselves would be an enormous effort, I assume.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

ErikBjare avatar Mar 18 '17 23:03 ErikBjare

@calmh might know a thing or two about using Syncthing in an application-specific context like this. I haven't seen it done before so we might want to check with him before we start.

I've taken a look at the arguments to Syncthing and found -home which can be used to set a custom configuration directory. So pretty promising.

ErikBjare avatar Mar 28 '17 18:03 ErikBjare

I've started prototyping something small here: https://github.com/ActivityWatch/aw-syncthing/

Could be made to work both with standalone Syncthing and bundled Syncthing, but standalone would probably be preferred due to the dependency on the Python package syncthing which targets a specific version (right now targets 0.14.24 and the latest is 0.14.25).

What it does:

  • [x] Moves the database file to a specific location
  • [x] Creates a symlink from the new to the old location (so aw-server will just follow the symlink to the file)
  • [x] Starts Syncthing with a custom configuration directory
  • [x] Configures Syncthing via the REST API to add the new database folder as a synced folder
  • [ ] Add another device to sync the folder with

ErikBjare avatar Mar 29 '17 20:03 ErikBjare

From a Syncthing point of view there's no real difference to it - you just start it up, point -home towards somewhere suitable, and configure it appropriately. You'll need to do the exchange of device IDs and so on somehow. As for the Python package, I see that it mentions 0.14.24 specifically but probably only because that was the latest when the README was written. All of 0.14.x speak the same API so there is no difference (and 0.12, 0.13 as well for the absolute most part).

calmh avatar Mar 29 '17 20:03 calmh

@calmh: Awesome! I'll let you know when we have a working release.

ErikBjare avatar Mar 29 '17 21:03 ErikBjare

I've started using Standard Notes recently (finally getting off Evernote) and have been impressed by the architecture. They have designed a neat data format/server called Standard File that defines how data should be encrypted and stored both client-side and server-side. Definitely something to check out.

Edit: It's interesting, but I'd rather have it distributed than just decentralized.

ErikBjare avatar Jun 15 '17 21:06 ErikBjare

I've been thinking about this a bit more.

My current idea is to simply configure a folder as a synced databases-folder. Basically aw-server would copy local data to this folder on a regular basis.

This folder could then be synced with Syncthing, Dropbox, or Gdrive (we should probably explicitly recommend Syncthing). The synced database files would not be allowed to be modified from another host then the one who owns them, since such changes could cause syncing conflicts.

Potential problems:

  • It would be nice to have the synced databases encrypted
  • Compressing them would also lead to huge storage savings

ErikBjare avatar Aug 24 '17 13:08 ErikBjare

Reddittracker turned up this today. Makes it pretty clear that sync is a vital feature for most users.

The best part is that you can put it on all computers (home and work) and on a smartphone. It'll track the software and sites you use on all of them and aggregate it to one account.

ErikBjare avatar Sep 13 '17 16:09 ErikBjare

Would be nice if this is implemented that it doesn't add to the system requirements to run the program. So for people who don't need the functionality or would rather just set up a cron job to copy it to a remote server manually can still disable the feature.

hippylover avatar Nov 21 '17 18:11 hippylover

@hippylover Noted! Thanks for the feedback.

ErikBjare avatar Nov 21 '17 18:11 ErikBjare

I googled activitywatch + backup. Trying to locate where the data is stored. Would be really nice to be able to 'set' where the data is stored.

Backup solution I use--is to put important stuff I'm working on under Dropbox or MEGA. I'm under linux...and I actually add a home directory, so that I ...guess it makes me more 'aware' that it's Dropbox data.

I just read through the above comments--supposedly MEGA is end to end encrypted. I started using because more free data.. but it has the bonus of not having to mess with an encryption solution if you want to store it encrypted.

brizzbane avatar Jan 11 '18 17:01 brizzbane

your sync look like auto-backup to me (or i've miss understoud)

how do you merge activity from multiples devices ?

if i was in charge, i probably use git as sync/merge tool if the data are stored in plain text files. But i've not explore your code base to make me an idea if it's a good way or not for this projet.

1000i100 avatar Apr 06 '18 11:04 1000i100

@1000i100 The difference between sync and auto-backup would be that in auto-backup there's a definition of a producer and a consumer while in sync it doesn't, and by that definition we might actually refer to auto-backup yes.

Merging activity from multiple devices is not an issue as long as the one device you are requesting data from has the data for all the devices you want to view. Each kind of data is separated by activity type per each host which we call buckets.

Plaintext is simply not scalable and therefore git is out of the question. If we have 500MB of data and convert it back and forth between a database and plaintext file it would be incredibly slow.

johan-bjareholt avatar Apr 06 '18 12:04 johan-bjareholt

Started working on something small as an experiment: https://github.com/ActivityWatch/aw-server/pull/50

ErikBjare avatar Aug 20 '18 21:08 ErikBjare

raises hand

Just wondering - isn't the storage a database? syncthing doesn't handle database syncing.

madumlao avatar Nov 27 '18 22:11 madumlao

@madumlao I don't get that either, syncthing syncs file by file and it is near impossible to do a diff of a binary sqlite fine. The database can easily grow past 100MB and it's not viable to sync such a large file frequently.

johan-bjareholt avatar Nov 28 '18 08:11 johan-bjareholt

@madumlao Correct, but the database is stored in a file, which can be synced.

@johan-bjareholt Syncthing is smart enough to not sync the entire file if only parts of it have changed, see: https://forum.syncthing.net/t/noob-question-incremental-sync/1030/17

ErikBjare avatar Nov 28 '18 09:11 ErikBjare

@ErikBjare Oh nice. Googled a bit on the sqlite database files and they seem to be paged so that should be fine then. I just assumed that it was as bad as git when comparing binaries but apparently they have solved that issue.

johan-bjareholt avatar Nov 28 '18 09:11 johan-bjareholt

Would syncing with syncthing also mean that we will have multiple database files? In that case we might need a lot of refactoring.

johan-bjareholt avatar Nov 28 '18 10:11 johan-bjareholt

@ErikBjare I'm not convinced that an SQLite db will survive syncthing. At best case you'll lose transactions done on one side, at worse case you'll have a mispaired hot journal which will corrupt the whole db. Effectively, if an aw-server process is running on two machines there's going to be contention.

https://www.sqlite.org/howtocorrupt.html

The only way that syncthing, rsync, or similar process is going to be "safe" is if each transaction is a separate file, but I guarantee that that's going to be bad. You really need to implement some kind of peer to peer syncing db, such as for example, a multi-master LDAP.

madumlao avatar Nov 28 '18 12:11 madumlao

@johan-bjareholt Yes, each instance would write to its own file in the synced folder(s) (there are some benefits to having one Syncthing-folder per instance, as Syncthing can enforce "master copies" preventing accidental deletion/corruption on other machines). An instance would therefore have read-only access to database files from remote machines. I don't think this requires any major refactoring.

@madumlao I am aware, I'm not proposing we sync a single sqlite database file.

I thought I had mentioned it in the issue before, but I realize now that I hadn't. Hopefully this should clear things up: I'm not proposing two-way sync in the sense that you can edit remote buckets, only read them (and create copies, which you could in turn modify).

ErikBjare avatar Nov 28 '18 15:11 ErikBjare

I see. A full-on p2p system would be very much appreciated. I have a case where I have multiple laptops / devices that all move around. Unless I set up a single server and configured all clients (including firefox extensions etc) to talk to that server, my activity watchers will all have gaps in activitytracking, defeating the purpose of review.

Ideally a user who has multiple devices can transfer in between devices with little setup, and the tracking will follow them throughout.

Maybe the laziest / easiest way to do this without major rearchitecting is to use periodic "sync checkpoints", which would basically:

  1. generate periodic sqlite dumps into some shared syncthing
  2. upon startup (or periodically), check the shared syncthing folder for all sqlite dumps made by other nodes and import any transaction later than the "last remote transaction synced"
  3. write down the "last remote transaction synced" somewhere for tracking

Could be implemented as a separate watcher-like process.

(My assumption is that tracking events are largely just additive transactions, there is little editing done)

By the way, I have no idea where the sqlite database is saved. Any pointers?

madumlao avatar Nov 29 '18 04:11 madumlao

@madumlao That's almost the exact design I had in mind for the MVP, nice to see we arrived at the solution independently!

We use appdirs to manage files like the database, caches, and logs. So check /home/<USER>/.local/share/activitywatch/aw-server if you're on Linux, or the appdirs documentation for user_data_dir otherwise.

ErikBjare avatar Nov 29 '18 09:11 ErikBjare

Just to be sure, there is currently no across-device syncing available yet, right? If so, once syncing available I'd gladly switch from RescueTime. I constantly switch across different computers.

x-ji avatar Dec 19 '18 13:12 x-ji

@x-ji No it's sadly not available yet.

johan-bjareholt avatar Dec 19 '18 19:12 johan-bjareholt

What might also be interesting is some integration with Nextcloud (disclaimer: I'm designer there :)

  • The ideals of the projects are quite aligned: being in control of your data.
  • Nextcloud is already reasonably widely adopted. That means you don't need to write an extra server, and people don't need to install something extra.
  • We support MySQL/MariaDB, PostgreSQL and SQLite (via some db abstraction I guess) cc @rullzer @MorrisJobke for technical questions.
  • There could be a server-side Nextcloud app which displays the data too. Since the desktop dashboard is already a web interface, that could be reused.

What do you think?

jancborchardt avatar Dec 27 '18 13:12 jancborchardt

@jancborchardt I like Nextcloud, but I don't think that's a direction we want the core project to go in (and I'm pretty excited about building a decentralized sync feature for a "localhosted" application).

I could elaborate, but I don't want to be overly critical (as I sometimes can be) so I'm just going to leave it at that :slightly_smiling_face:

However, if you're interested in making a business case out of it we're all ears! (and please let me know what you think of my reply in #257, that's really interesting for us)

ErikBjare avatar Dec 27 '18 15:12 ErikBjare

I definitely agree with not tying the core AW project to a specific sync implementation. As long as the abstraction is on the file level, it's totally application agnostic which is definitely great from a "my data, my way" perspective. It lets users choose how (or even if) they want to synchronize.

If having Nextcloud integration is a priority, AFAICT all that's needed is an instance of aw-server running on the Nextcloud box (or somewhere it can reach) and a Nextcloud webapp to interface with it.

zeonin avatar Jan 11 '19 15:01 zeonin

Personally I would much prefer having a centralized server. It seems to me like implementing some security on the communication between servers and clients would be a lot simpler than implementing some kind of p2p sync between servers.

For my use-case, where I have a single computer that runs both linux and windows with dualbooting, I will never have both servers running at once anyway, so any syncing would need to go through some 3rd host regardless. Running a single server on a seperate host seems like a much easier solution.

I'm up for implementing the security needed on the server.

What would you want to see in a PR in order to merge support for having a single server for multiple clients/devices?

Maistho avatar Mar 10 '19 18:03 Maistho

@Maistho Basically just HTTP authentication, preferably using OAuth in some way.

Would require password-protecting the web UI as well as adding a configuration option to aw-client to include the HTTP auth key. I'm a bit rusty on OAuth, but that's the gist of it.

Edit: Oh, and tests, lots of tests.

Edit 2: And HTTPS...

ErikBjare avatar Mar 10 '19 19:03 ErikBjare

I like Nextcloud, but I don't think that's a direction we want the core project to go in (and I'm pretty excited about building a decentralized sync feature for a "localhosted" application).

It’s your call of course. :) It just seems that you want to develop an activity tracking app, already have limited time for that – and then working on a sync server will take even more focus away from that?

Nextcloud could even just be one of many, by simply supporting WebDAV for syncing. Yay for open standards. ;) And another point is ease of setting up: If you want ActivityWatch to be accessible and usable by lots of people, it has to be dead simple. If for syncing you have to set up your own separate server, that’s a dealbreaker.

jancborchardt avatar Mar 12 '19 12:03 jancborchardt