police-brutality icon indicating copy to clipboard operation
police-brutality copied to clipboard

All content needs to be rehosted

Open zimmertr opened this issue 4 years ago • 24 comments

These videos, images, and other information should be downloaded and rehosted elsewhere in addition to posting the original source. Otherwise the content is at stake of removed.

The content should also be easily available for mass download. This will prevent the loss of the content in the event that this repository is removed. Perhaps using something like bittorent or a self hosted peer-to-peer synchronization program akin to Google Drive/Dropbox.

I'm happy to donate towards hosting fees if necessary.

zimmertr avatar Jun 02 '20 22:06 zimmertr

We could use some CI rules to auto archive the linked footage. We currently have a repo with most of the videos archived (link on the README) but we should definitely automate it.

2020PB avatar Jun 02 '20 22:06 2020PB

Thanks for pointing out the other repo in the Readme. I admit I missed this when glossing over it. However, I am not sure this is a long term solution given that the contents are not hosted and published by a third party. Right now they are at the mercy of GitHub/Microsoft which has not necessarily acted favorably towards less legal repos in the past.

Not to indicate this is currently less than legal. But I also don't trust that illegality is a requirement for action anymore. And I think that Microsoft would take a stance against the public if under enough pressure.

zimmertr avatar Jun 02 '20 23:06 zimmertr

That's correct, however that repo is also archived on IPFS so if it goes down it will not be gone. Do you know if there is a way to have an external API verify that it is being called by the CI scripts on a github repo? I have an API we can upload files to for IPFS reupload, but I don't want it to get spammed when I add the token.

I will look into this tonight, but if you know of a good way to do this please lmk and I will try to set something up.

2020PB avatar Jun 02 '20 23:06 2020PB

I'm sorry I don't. However, I do have infrastructure development skills if you need assistance building out a server/cloud infrastructure for this project.

zimmertr avatar Jun 02 '20 23:06 zimmertr

Awesome, I will let you know! I don't think it should be necessary because we already have a good resource for hosting stuff, but you never know what we'll need later!

2020PB avatar Jun 02 '20 23:06 2020PB

@2020PB "Do you know if there is a way to have an external API verify that it is being called by the CI scripts on a github repo? I have an API we can upload files to for IPFS reupload, but I don't want it to get spammed when I add the token."

This should be a non-issue. You can attach the token in the CI server so that no one can read it back/see it assuming someone doesn't allow in a Pull Request that somehow makes it visible in CI logs. Note: I assume the host header could be checked as well.

@2020PB I've created this -> https://github.com/hunterwilliams/link-archival which can assist with archiving everything in an automated fashion. If there is a file structure etc you want that would be good to know. Needs a bit of process though to just drop in. Willing to assist. It can be used now though to download /screenshot some items (i haven't gotten through automating all of video downloads just twitter).

hunterwilliams avatar Jun 03 '20 00:06 hunterwilliams

Love the work being done here! To me, the ideal infra would be uploading images/vids to ipfs and using a P2P solution (gunjs, orbitdb) for storing structured JSON that links out to IPFS and includes additional metadata.

Edit: I think IPFS rehosting is more realistic for this repo as it exists

mjmaurer avatar Jun 03 '20 17:06 mjmaurer

I'd be happy to build in IPFS hosting, but I'd like #163 to be addressed. Rehosting on P2P makes it much more likely to always exist. I personally wouldn't want an image of me widely circulated without my consent.

mjmaurer avatar Jun 03 '20 19:06 mjmaurer

I've been archiving data onto IPFS, and have the following archived media:

Hosting data on IPFS may be a bit tricky if you want to be anonymous or not publicly be known as backing up the data. If you're using IPFS it is trivial at best to find out, and trace people hosting content. If you want anonymity or to not be identified as someone backing up the data, IPFS is not a good idea.

Latest archive

bonedaddy avatar Jun 04 '20 03:06 bonedaddy

As it is, someone gives up anonymity in the form of a PR

mjmaurer avatar Jun 04 '20 03:06 mjmaurer

@mjmaurer if you need anonymity - you can message the mods on reddit. Is that good enough?

ubershmekel avatar Jun 04 '20 04:06 ubershmekel

Mentioned in the other issue but made a script that will download all the videos and also screenshot the webpages for posterity. But hopefully that helps with the ephemeral nature of the internet. Downloading all the links now so very much still a WIP but feel free to play around with it https://github.com/valadect/pbbackup

valadect avatar Jun 04 '20 08:06 valadect

Would it be illegal to post media files in the repo itself so that people can back them up simply by cloning the repo? Of course this could just be a secondary backup method. (And bear with me since I am new to this community).

nathanfranke avatar Jun 05 '20 10:06 nathanfranke

@nathanfranke Considering no one is profiting from it and also the fact that it is for educational use it should be fine for the most part.

valadect avatar Jun 05 '20 11:06 valadect

I'd be happy to provide a backup server in France if that's useful, to avoid US law and companies.

tgalopin avatar Jun 05 '20 16:06 tgalopin

@nathanfranke there's this repo which I think is dedicated to files: https://github.com/pb-files/pb-videos

Though we don't have a good way to link between the two repos yet.

ubershmekel avatar Jun 05 '20 16:06 ubershmekel

If you want to be able to mirror media locally, and optionally upload to an IPFS node checkout the downloader tool in the tools folder.

bonedaddy avatar Jun 05 '20 21:06 bonedaddy

Perhaps reposting on LBRY might be useful?

modelmat avatar Jun 06 '20 07:06 modelmat

Maybe looking into Archive.org for hosting might be worth a shot. They offer an S3-like API to upload files: https://github.com/vmbrasseur/IAS3API#internet-archive-s3-api-documentation

krmax44 avatar Jun 06 '20 13:06 krmax44

Hey guys, just learning about this project, happy to host everything on Skynet. Skynet is a platform similar to IPFS, except instead of seeding the files yourself, a decentralized platform called Sia (similar to what Filecoin is meant to be) seeds the files for you. You get uptime + decentralization without having to host anything yourself.

How can I get started?

Is there a chatroom somewhere? Some of this might be easier to do in real time. I've got questions like:

  • What's the legality of these? Who owns the copyright? Can we get the uploaders to add CC0 licenses so that there are no legal conerns?
  • Where are we sourcing these from? Is this list complete or are there other places we can look?
  • Should we keep snapshots of the repo overtime? Or should we upload every file to Skynet individually and just keep a growing list? Do we want to add the Skylinks to the repo here?

DavidVorick avatar Jun 06 '20 18:06 DavidVorick

I'm not a lawyer, but I hope this data would fall under "fair use" such as in a documentary: https://en.wikipedia.org/wiki/Fair_use#Documentary_films

ubershmekel avatar Jun 06 '20 20:06 ubershmekel

I would hope it is considered criticism or documentary since getting permission from all filmers would be effectively impossible.

nathanfranke avatar Jun 06 '20 21:06 nathanfranke

Please link to the mirrors inside the repository so they can be found.

I have made a barebones sia-skynet remote for git-annex in https://github.com/xloem/gitlakepy (EDIT: fixed link) which should help skynet interoperate with git or datalad a little. I have also made a barebones bsv remote for git at https://github.com/xloem/git-remote-bsv providing for storing lightweight git repositories on a different storage-oriented blockchain than skynet. Unlike skynet bsv content cannot eventually be lost on the network.

xloem avatar Jun 07 '20 15:06 xloem

Absolutely need to do this.

What about using Internet Archive? If needed, I have a few TB of storage on my NAS I can donate on temporary basis as well.

karan avatar Jun 09 '20 04:06 karan