kubo icon indicating copy to clipboard operation
kubo copied to clipboard

IPFS filtering to allow node operators to decide on content they are willing to serve

Open thibmeu opened this issue 2 years ago • 6 comments

Checklist

  • [X] My issue is specific & actionable.
  • [X] I am not suggesting a protocol enhancement.
  • [X] I have searched on the issue tracker for my issue.

Description

Recently, Cloudflare has open sourced a fork of go-ipfs providing filtering capabilities, grouped under safemode command. The architecture is described in a dedicated blog.

The system works by filtering certain CID when walking the DAG. This allow node operators to prevent certain CID from being provided, both by the HTTP gateway and to the P2P network. CIDs to be filtered are stored in a blocklist. By default, this blocklist is in a dedicated mount of the datastore /safemode.

Action that can be performed by a blocklist are (based on the proposed interface):

  • block to add content to the blocklist
  • unblock to remove it
  • purge to remove content from the blockstore. Ideally, this option could be extensible, to purge remote datastore, or HTTP cache for instance
  • search to query the blocklist
  • audit to access the log of actions that have been performed against the blocklist

For convenience, ipfs safemode command provides multiple way to resolve content. From its documentation:

- IPFS address, i.e. /ipfs/<CID>
- IPNS address, i.e. /ipns/<hash_publickey>
- DNSLink address, i.e. /ipns/example.com
- HTTP URL, i.e. https://example.com/ or https://gateway.example.com/ipfs/<CID>

This is a proposal implementation, which satisfies some requirements laid out in https://github.com/ipfs/roadmap/issues/64. It provides a more standardised approach for node operators to filter content they are willing to provide.

The implementation has been developed 3 years ago, and may not suit the current architecture of the go-ipfs project.

thibmeu avatar Oct 06 '21 13:10 thibmeu

@thibmeu : thanks for bringing this up. I think we need to have a larger discussion about the kind of software Gateway Operators want to have before we keep proceeding with the status quo of go-ipfs serving the wide range of usecases from high traffic gateways to desktop applications. go-ipfs maintainers are going to link discussions/notes that we're having in 2021Q4 on this topic to https://github.com/ipfs/go-ipfs/issues/8499 . We'll certainly be engaging with Cloudflare as part of this process.

BigLep avatar Oct 08 '21 19:10 BigLep

2022-06-03 conversation: we have the capability for this in go-bitswap per https://github.com/ipfs/go-ipfs/issues/8763 . If you're interested in contributing a plugin, that would be welcome. Otherwise this isn't a priority for the core maintainers because go-ipfs isn't really designed for large-scale operations, but we'll support operators on any reviews.

BigLep avatar Jun 03 '22 15:06 BigLep

@guseggert will link the issue that is actively being worked on right now that will make plugins easier to write/maintain.

BigLep avatar Jun 03 '22 15:06 BigLep

The issue is https://github.com/ipfs/go-ipfs/issues/7653, which allows arbitrary modifications to the go-ipfs dependency graph using a plugin, so that you can inject a custom exchange.Interface (e.g. a Bitswap instance w/ a customized filter).

guseggert avatar Jun 03 '22 15:06 guseggert

I believe it is time to prioritize this. There is enough need and interest around blocking bad bits for this to be part of Kubo, and not just a plugin:

  • https://www.theregister.com/2022/07/29/ipfs_phishing_trustwave/
  • https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/ipfs-the-new-hotbed-of-phishing/

Quick notes:

  1. denylists are not enough. it has to be allow and deny lists from the start
    • node operators been asking not only for blocking bad bits, but also a primitive for blocking everything and only allowing specific CIDs and paths (e.g. a startup only wants to run a gateway to host their user data etc). if we don't tackle allowlists as part of this, we will end up with franken-api in the future when allowlists are bolted on awkwardly.
  • MVP:
    • Add command namespace (tbd, ipfs rules --help is as good as any other) allowing user to build content policy around allow or deny (and set the default strategy).
    • We don't need to cover all use cases, it should be a low level primitive that allows people to implement their own strategies on top of (similar to firewall rules).
      • each cid / path has to be added as an explicit allow or deny entry
      • use default policy when no entry matching
      • ability to mark added rule as sensitive (enables us to interop with https://badbits.dwebops.pub/) so it is never stored/exported in cleartext
      • use this during path resolution, bitswap and processing Gateway requests (covers the common asks from the community)
  1. import and export commands should be part of the UX, but we need to agree on the transport format – gathering feedback in https://github.com/ipfs/specs/pull/299

lidel avatar Aug 02 '22 12:08 lidel

Another requirement from Infra team: ability to allow / deny specific PeerIDs.

This is a real world which I also needed in the past. In many cases, we struggle to create deterministic test fixtures. Making sure node can't dial specific Peer and needs to get data from someone else requires disabliing more and more internal services (mdns, routing, relays...) and is very brittle, test setup can break the moment we introduce new discovery method.

When we design ipfs rules it should encompass allow / deny rules for:

  • CIDs and content paths
    • note: iiuc we already have places to plug-in hooks (e.g., go-bitswap)
  • PeerIDs and multiaddrs
    • or any other Content Routing Hints (e.g., reframe endpoints passed by a client as a routing hint)

lidel avatar Aug 09 '22 20:08 lidel

For clarity, the new spec on this topic is https://github.com/ipfs/specs/pull/340

BigLep avatar Jan 09 '23 03:01 BigLep

Linking related work by @hsanjuan for discoverability

  • IPIP-383: compact denylist format
  • https://github.com/hsanjuan/nopfs – an implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack and particularly to Kubo.

lidel avatar Mar 28 '23 23:03 lidel

Note that it depends on: https://github.com/ipfs/kubo/pull/9750. Nopfs injects itself as a NameSystem, path.Resolver and BlockService wrapper so that it can block things before Resolution and Retrieval.

More generally it depends on Kubo providing a more stable way of plugging-in a Blocker, which basically provides 2 methods:

  • IsCIDBlocked(CID) Result
  • IsPathBlocked(ipfsOripnsFullPath) Result

The Result can be a bool, but I'd prefer an err or any other type that can carry additional information about the block (for example the error could explain the reason of the block, or the denylist that triggers it to the user).

hsanjuan avatar Mar 29 '23 07:03 hsanjuan

Relevant discussion happened today in 2023-05-16-Content-Routing-WG-11.

Summary of the burning need at hand
  • Priority for IPFS ecosystem is to allow operators to have built-in support for self-managed or publicly available lists like https://badbits.dwebops.pub
    • at the same time we don't want to hard-code anything, nor spend too much time on opinionated update polling/update/composition logic
  • MVP in Kubo would be to observe a file on disk in the format from IPIP-383 and apply the deny rules present in it.
    • this simple primitive should be easy to implement, but at the same time allows operators to compose a deny list outside Kubo, and also manage logic responsible for fetching updates, if a third-party list is used.
Loose implementation scope/direction
  • reuse code from ipfs-shipyard/nopf
  • follow path conventions from IPIP-383 + Kubo-specific location at $IPFS_PATH/denylists/*.deny

lidel avatar May 16 '23 15:05 lidel

Hi, I chatted briefly with @BigLep and there seems to be interest to bring this MVP to Kubo. I can do that.

To summarize:

  • We have a working plugin https://github.com/ipfs-shipyard/nopfs/tree/master/nopfs-kubo-plugin that watches denylists on disk. Any appends to those denylists are processed, so that you can echo "/ipfs/<cid>" >> denylist and not have to re-start Kubo.
  • There is no "watch" system implemented yet. I have thoughts about this (unixfs files + pubsub) but I think this should come later.

What we need:

  • We need to settle the IPIP - https://github.com/ipfs/specs/pull/383 (I would like to at least)
  • We need to decide how to bring noPFS into Kubo (as experimental feature):
    • The most straightforward way is to bring it as pre-compiled plugin into /plugin/plugins (I personally lean this way as it is cleaner, specially for an experimental feature).
    • We can also wrap things by hand during setup. As reminder, blocking checks are performed on:
      • Blockservice - CID
      • NameSystem - IPNS blocking
      • IPLD/UnixFS Path Resolvers - Path blocking.
    • I think it is worth integrating the library part of NoPFS into Boxo.
  • Additionally, we can start discussing (but no need to decide) about:
    • Subscribing to lists
    • More integration: i.e. gateway responses could detect Blocked errors and look into the rule hints for http return code values.

Is there a meeting we can use to go over this so I can start work ASAP (my window of availability is 4 weeks).

hsanjuan avatar Oct 02 '23 13:10 hsanjuan

2023-10-02 conversation:

  • It should be enabled be default
  • Have a config flag or environment variable for disabling this functionality.
  • This is not experimental - it's built in on.
  • FYI: There is a "Bad Content Working Group". Agreed with them that shouldn't block this going forward even there is larger ideals of servicing other ecosystems.
    • https://lu.ma/ddc-wg
  • Agreed that it can be a pre-compiled plugin.
    • There are good learnings here about having external plugins that will move over to https://github.com/ipfs/go-ds-s3 to not lose that.

BigLep avatar Oct 02 '23 15:10 BigLep

I have done another round on the spec. There's an open question about defaulting or strongly suggesting a function for double-hashing... input is welcome as I understand that sha256 is not the best.

Note that every CID and CID+Path needs to be double-hashed when a list includes a double-hashed block item in it, so the function should aim to minimize the perf impact.

hsanjuan avatar Oct 03 '23 12:10 hsanjuan

A minimal implementation of IPIP-383 from https://github.com/ipfs/kubo/pull/10161 landed in master branch and is scheduled to be released in Kubo 0.24-rc1 for feedback. More details in /docs/content-blocking.md

lidel avatar Oct 28 '23 03:10 lidel