pictshare icon indicating copy to clipboard operation
pictshare copied to clipboard

Scaled instances and the deletion problem

Open geek-at opened this issue 6 years ago • 7 comments

Now the codebase has been rewritten we can start and think about the problem with scaling pictshare: deleting content.

Imagine two Pictshare servers connected through a shared folder (ALT_FOLDER)

An image is requested frequently so both servers have a local copy and there is a copy in the shared folder.

If the user wants to delete the image, it's deleted off the server the user sent the request to and from the shared folder.

the second server never got any info about any deleted hash so they kept theirs.

Possible solutions:

  • Keep a list of deleted hashes in all storage controllers
  • Use some kind of centralized database that manages all hashes and stati
  • Make all nodes somehow communicate with each other

geek-at avatar Dec 29 '18 16:12 geek-at

I'm really loving that this app doesn't require a database, a centralized database will introduce some complexity. A list of deleted hashes in all storage controllers and a cron job is quite simple and would do the job. I'm guessing instant deletion is not really required.

thomasjsn avatar Mar 08 '19 09:03 thomasjsn

Just some thoughts:

Each server maintains a list of peers.

The first server is created (0), then the second server is created (1) and pointed to 0. 0 and 1 both update their lists to [0,1].

For each server added after 1, the server being added is pointed to any server (N). N iterates through every server in its list but itself (If N is 1, this subset is [0]) and sends an HTTP message telling it to add the new server N to their list. Making the list [0, 1, N] on each server.

With this in place, when a server receives a delete request, it performs the delete, then sends a delete signal via HTTP to each server on its list (which should be up to date given the above works).


TL;DR - I agree with making nodes communicate.

cwilby avatar Nov 19 '19 08:11 cwilby

The problem with all nodes talking to each other is that it would complicate the whole project by a landslide.

I think the easiest way to implement it would be to have a list of deleted hashes that won't get re-used by chance and this list should be copied and checked by all storage providers

geek-at avatar Jan 05 '20 00:01 geek-at

Sounds good, where would the deleted hashes be stored? If each node has a copy it would be similarly complex

cwilby avatar Jan 06 '20 23:01 cwilby

The easiest implementation would be a simple file where delelted hashes are stored

This file should then be compared with the list on every storage controller and every pictshare instance should periodically check this file for hashes to delete. and check storage controllers for updated hashes to add to their local list.

It's just a simple blacklist system. I think that could work.

geek-at avatar Jan 07 '20 07:01 geek-at

Yep that sounds like it could work. Each node can be configured to communicate with a service to add/read deleted hashes. Would the service be the root pictshare instance or something else?

cwilby avatar Jan 07 '20 21:01 cwilby

I'm thinking cronjob so admins can set their own intervals for comparing the blacklist and deletions can take as much time as they need

geek-at avatar Jan 07 '20 23:01 geek-at