takahe icon indicating copy to clipboard operation
takahe copied to clipboard

add support for enabling Mastodon 4.2 search indexing

Open osmaa opened this issue 1 year ago • 11 comments

adds the opt-in attribute which enables Mastodon 4.2 to index toots from an account

osmaa avatar Nov 16 '23 14:11 osmaa

I was considering that, too, but it seemed to me that it was conceptually a little different. search_enabled is a feature flag for (a partially implemented?) local search of local accounts, while indexable is an AP Actor level flag for opt-in to being indexed on remote servers.

Was this the other way around, ie indexable had come first, then it would be obvious to implement search_enabled on top of that.

Heads up though! The migration appears to have created this as a non-nullable column, and I missed at least one code path which leaves the attribute null during fetch/create. Will review.

osmaa avatar Nov 16 '23 17:11 osmaa

Hmm, my lack of experience with Django shows again. As far as I can see, indexable is defined to default to false everywhere, but somehow it's still passed as null into a database insert here.. https://github.com/osmaa/takahe/blob/883c607468252fdfbf107cfe5c35ca86d6afc70c/users/models/identity.py#L452

osmaa avatar Nov 16 '23 18:11 osmaa

I agree they are different meanings conceptually, but I still would like to combine the meanings now - two search options just seems too many, and I don't see a lot of cases where people would enable it locally but not remotely and vice-versa. It's a little bit of an expectation-breaking change, but I am alright with it in this instance.

andrewgodwin avatar Nov 16 '23 18:11 andrewgodwin

Fully agree that the privacy-related settings in Mastodon are too many. I've been meaning to outline a matrix of all of the possible combinations to see which of them even make sense. I don't know what to make of the existence of these technically valid combos, for example:

discoverable=false, indexable=true, toot=public (it's not listed on Mastodon's local timeline, but can be found by text search) discoverable=true, indexable=false, toot=public (is listed, but not search indexed) discoverable=true, indexable=true, toot=unlisted (not listed nor searchable) discoverable=true, indexable=true, noindex=true (opted in to be indexed by everyone but web search engines) discoverable=false, indexable=false, noindex=false (opted out of being found on Mastodon, while allowing web search crawlers)

It's a mess. Is it a mess that can be cleaned up? If it was just me, I'd just merge all three account level settings to one (values: promote, search, unlisted), and disallow use of "public" toot level on unlisted accounts.

osmaa avatar Nov 16 '23 19:11 osmaa

Right, it being a bit of a mess was kind of the thing I wanted to avoid. I do think that in Takahē's case, with just two options - "discoverable" and "search_enabled" - we end up with only three sensible configurations:

  • Discoverable and searchable: Where most people probably end up
  • Discoverable but not searchable: Maybe you're trying to avoid harassment enabled via search
  • Not discoverable but searchable: Should not be allowed, makes no sense
  • Not discoverable or searchable: Traditional privacy-focused stance

I'm not sure how sensible it might be to make the UI switch search off if you flip discoverable off, but it feels like it should.

andrewgodwin avatar Nov 16 '23 21:11 andrewgodwin

I would argue that:

Discoverable but not searchable: Maybe you're trying to avoid harassment enabled via search

is superfluous and should be instead delivered by automatic pruning of old toots from both timelines and search indices. "Allow my toots to be discovered but only for X days/weeks".

While your:

Not discoverable but searchable: Should not be allowed, makes no sense

That would be someone who opts in to be found by explicit search, but wouldn't want to be shown in trending lists or being algorithmically promoted.

I didn't even include that Mastodon further complicates this by having different logic for hashtags. Again, if it was just me, I'd say that hashtags should be restricted to public toots only. Yes, there are nuances like being generally unsearchable but opening tiny windows into discovery on very specific topics only, but the complexities around documenting that kind of behavior make it into a trap.

So the question really is, how much does it make sense to try to do things different to Mastodon, which has evolved to a weird legacy of incompatible layers, but is the dominant source and consumer of ActivityPub content. Plus, if you still also have plans of also exploring AT proto PDS functionality, that'll map different. Mostly just 100% public with no control over third party indexing, though..

osmaa avatar Nov 17 '23 07:11 osmaa

Well, automatic pruning of local things from searches would be nice, but that's a separate feature so I'm not going to say we should do that now.

In general I want to keep Takahē relatively low on options and complexity - so I think just tying Mastodon's indexable property to "search enabled" and changing its help text to say that it enables you to be searched locally and remotely would be the way to go here.

andrewgodwin avatar Nov 17 '23 16:11 andrewgodwin

this seems quite important feature for users like mine, regardless of separate option or not. what's best next step to get it merged?

alphatownsman avatar Nov 26 '23 17:11 alphatownsman

I'm willing to accept a PR for this that just does this flag based off of our existing search_enabled and discoverable flags, where you get marked as having search indexing allowed if they're both true.

andrewgodwin avatar Nov 26 '23 18:11 andrewgodwin

does this flag based off of our existing search_enabled and discoverable flags

There's no perfect solution and I can totally live with this.

how much does it make sense to try to do things different to Mastodon

@osmaa I agree with you this is real concern if Mastodon exposes these searchable options separately via API, but right now they are only changeable in UI I guess, so I'm ok with Andrew's suggestion above.

alphatownsman avatar Dec 06 '23 13:12 alphatownsman

@osmaa @AstraLuma this is absolutely great feature. any chance get this updated / merged? happy to do anything I can to help.

alphatownsman avatar Feb 10 '24 15:02 alphatownsman