known icon indicating copy to clipboard operation
known copied to clipboard

Move away from URLs for internal object addressing

Open benwerd opened this issue 6 years ago • 14 comments

Every internal entity should simply be referenced with an internal hash ID. This includes embedded images etc, where those images are hosted on the Known site.

benwerd avatar Dec 07 '19 19:12 benwerd

Yes, with a couple of ifs and buts, and it depends on what you mean by UUID. UUID as in uniqueid is fine and is fairly industry standard, URI is a different matter.

Fundamental problem with the existing method is that the U in URI isn't "unique", and since the data needs to be addressed by a resolvable persistent identifier that UUID ends up being neither unique, resolvable or immutable.

This feeds into a larger conversation about the data model itself, which has pretty serious issues. The document object model has many advantages when you want to be data type agnostic, and work in streams, but you all to often end up with the Mongo problem where you don't have a database, you create a bad filesystem.

Needs some thought all round.

I have strong views :D

mapkyca avatar Dec 07 '19 21:12 mapkyca

With a slightly different hat on, I think there is still utility in having a unified resolver for content (which is currently the UUID). What you're talking about there is producing a simple installable web platform that as well as being pretty cool in its own right, automatically mints PIDs for everything produced on the platform. Federalise this with maybe a mechanism to allow for Known site A to be able to resolve things for itself, and then a mechanism to resolve for other known sites, and all of a sudden things start getting... very interesting.

mapkyca avatar Dec 07 '19 21:12 mapkyca

In practice I don't think having UUIDs matters. Locally, an ID will provide everything we need it to. And then remotely, there's nothing wrong with referencing an external resource with a URI.

Preserving streams is important to what Known is - the situation I really want to avoid is what we did with the Elgg rewrite, where suddenly content was too separated.

Content needs to be:

  • Addressable internally without caring about content type (although filtering by content type should be possible)
  • Extensible in the sense that new plugins can easily add new content types while still enjoying privacy settings, etc etc etc
  • As simple as possible

The original intention with the UUIDs was to allow any Known site to be able to resolve any other Known site. I think, in retrospect, that this was an error.

benwerd avatar Dec 07 '19 21:12 benwerd

I disagree entirely.

Cross site resolution is, in my view, a critically valuable feature we shouldn't lose.

That's not so say the internal addressing is based on uri, but there needs to be a resolver, as well as a way to represent and address remote objects

mapkyca avatar Dec 07 '19 21:12 mapkyca

Actually, I don't think we do disagree. I think we're talking at cross purposes slightly Also I can hardly see right now so am probably crashing through things.

We should chat. I agree ditching URIs for internal representation is a good idea, plus maintaining stream capability is critical.. .however the current way of doing that is bad. There's much to do that could make it better

mapkyca avatar Dec 07 '19 21:12 mapkyca

So I think URI is a perfectly reasonable external resolver. I guess what I'm saying is that internal entities don't need to be referenced by the same mechanism as external entities. One of the big issues that has come up in Known's first five years is that moving domains is far, far harder than it ought to be - so decoupling internal IDs from that is important. But of course, for external entities, we've got to know where to look, without having to defer to a centralized lookup - something that URIs are perfectly suited for.

ETA: just saw your more recent message. I think this is true, and we just need to spec it out.

benwerd avatar Dec 07 '19 21:12 benwerd

Yep, we're all agreed that uris for internal representation is bad. However, for stuff that comes from elsewhere some mechanism for resolution that isn't necessarily a url would be very cool.

If you're talking federation, we need to be able to understand the provenance of any given entity (this is here, that's me over there), not least because that allows for some pretty nifty relational graphing.

I'm not sure if we're actually talking about PID minting here, which may be way further than we need to go, but if we can crack internal / external object resolution, we would have accidentally created a very powerful tool.

mapkyca avatar Dec 07 '19 21:12 mapkyca

I don't think we're talking about PID minting - that seems like ten steps further than we need (but could be a cool plugin). Agree on internal / external object resolution.

benwerd avatar Dec 07 '19 21:12 benwerd

Make the internal ID generation a hook, and attach a resolver url to federated content / include in the meta header. GTG

mapkyca avatar Dec 07 '19 21:12 mapkyca

I think it needs more consideration than that ...

benwerd avatar Dec 07 '19 21:12 benwerd

Almost certainly, and it depends on the full range of things one wants to be possible. But, hooks means a plugin can introduce its own identification schema easily, and a resolver would mean any given known will be able to resolve its own content, or delegate resolution for things that are elsewhere.

I'm glossing over a lot here, but this gives you some very cool options wrt federation

mapkyca avatar Dec 07 '19 21:12 mapkyca

UUID is useful when establishing a network and when backing it by something else to make it unforgeable. I would point toward zot's guid system (256-bit guid + an accompanying guid_sig which is cryptographically verified) as an example for that, effectively powering the "zot network". Having a "Known network" would be cool too, but even if not, using somewhat-consistent identifiers makes it easier to resolve content from other Known sites, yes.

Is that desirable, however? That's a different question. Certainly there's an internal/external split, where the id you use internally does not have to match the one used externally. But there should also be a split between ID and URL. Especially in relation to #2615 this would be important to get right. You want the software to be aware of the ID which is backed by DNS / domain name's authority, and to use that for all global references... but for the user's purpose, you want them to be passing around something more user-friendly than known.example/objects/b1f24a1f-6c18-40ff-8b70-afe55cceb5c2 -- and from a security perspective you generally don't want to expose those internal identifiers unless necessary. In that sense, URL should map onto ID at whatever routing level.

What you need to be careful of is that external URLs can (and probably eventually will) change. Domain authority is not immutability, so a URL is only as stable a reference as the domain's software running forever/indefinitely. This is why you would probably want to assign an internal id that maps to last known external id.

Perhaps some relevant links:

  • "investigate nomadic identity" https://github.com/pixelfed/pixelfed/issues/216
  • "url schema" https://github.com/pixelfed/pixelfed/issues/51
  • "support account migration" https://github.com/tootsuite/mastodon/issues/177
    • nomadic identity overview https://github.com/tootsuite/mastodon/issues/177#issuecomment-354634232
    • use of inboxes compared to xmpp / imap https://github.com/tootsuite/mastodon/issues/177#issuecomment-378779767
    • how mirroring works under nomadic identity https://github.com/tootsuite/mastodon/issues/177#issuecomment-382761085
    • blockers within mastodon specifically https://github.com/tootsuite/mastodon/issues/177#issuecomment-399172654 and also https://github.com/tootsuite/mastodon/issues/10745
    • mapping ids from one site to another https://github.com/tootsuite/mastodon/issues/177#issuecomment-477852263

trwnh avatar Dec 07 '19 23:12 trwnh

I'm pondering whether the simplest thing to do would be to

  • keep uuid resolution url, which maps current site base to id. Good for external res
  • Drop internal uuid references entirely - or rather, we can use uuid as a dynamic thing via a function and drop it from db storage. Since UUID is really just _ID tacked on to a url base, there's no reason to store it.

Doing this will keep external resolution, which is useful. Internal resolution will be directed through using IDs (deprecate getByUUID(), and map that to ID instead for backwards compatibility). We'd simplify the data model.

Just some sleepy thoughts right now, I'll ponder some more some other time

mapkyca avatar Apr 27 '20 20:04 mapkyca

I'd agree with this, fwiw.

benwerd avatar Apr 27 '20 20:04 benwerd