mastodon icon indicating copy to clipboard operation
mastodon copied to clipboard

Language Filter Not Working

Open Scalynko opened this issue 5 years ago • 24 comments

Expected behaviour

The languages that I selected will be shown to me

Actual behaviour

Languages that I did not select are still shown to me.

Steps to reproduce the problem

Go to Settings > Preferences > Other > Language Filter and select the language you want toots to be shown. Go to a local timeline and see if the languages you selected are only there.

Specifications

Google Chrome on iOS 0CC2CB98-A657-4DC9-B2DA-964650E8D0C4

Scalynko avatar May 30 '20 17:05 Scalynko

Thanks for submitting your first issue, @lynkono!

Unfortunately I was unable to reproduce it at mastodon.social.

This can happen because many languages are actually incorrectly detected as English (either the post is too short or letter set is too similar) and not much users set language that they post in in Settings. The filter itself is working, but you still see posts that are incorrectly labeled as English posts.

Related issues: #11118.

brawaru avatar Jun 01 '20 21:06 brawaru

That's strange. I just checked my local timeline and found a post in Thai, which uses a different alphabet than English. How did it detect it as English and if not, then why would the person put their account under English if that's the case?

Scalynko avatar Jun 02 '20 18:06 Scalynko

Can you please link the post that was incorrectly detected?

brawaru avatar Jun 02 '20 22:06 brawaru

https://mas.to/@mizuboshi340/104276455190708313 This was in my local timeline

Scalynko avatar Jun 02 '20 22:06 Scalynko

Thank you!

The problem with this post is clear: it does not meet the word count criteria (neither it has 4 of the words separated by spaces, nor it matches “reliable characters regular expression”), therefore language detection being skipped in favor of account's default posting locale, which is set to default value — English.

I actually believe this is an unintended bug, giving that “The Thai language does not use spaces between the words in a sentence” (source). The bug can be fixed by adding \p{Thai} to RELIABLE_CHARACTERS_RE.

Do you have any more examples besides Thai one?

brawaru avatar Jun 03 '20 03:06 brawaru

I remember seeing a lot of Spanish posts a few days ago, but I don't see any examples now. They also share the similar alphabet as English, so I don't know how that can be filtered out.

Scalynko avatar Jun 03 '20 03:06 Scalynko

Hi! I've got some free time and can fix issue related to Thai language. Did you find any other languages not detected correctly?

brawaru avatar Jun 06 '20 14:06 brawaru

I have also set English as the only language to display in the filter. I see these posts just now:

  • https://m.cmx.im/@anticlockwise/108227641272791118
  • https://mastodon.social/@mrcsbmr/108227641589153693
  • https://social.fedcast.ch/objects/967bd2c2-7e2f-4892-ac3b-e660b5f1e4df
  • https://xn--y9azesw6bu.xn--y9a3aq/content/e669fabf-bce6-4f04-895a-7100dd9c8615/
  • https://xn--y9azesw6bu.xn--y9a3aq/content/e246f2ba-3008-4030-9a42-257e7856c5f1/
  • https://mastodon.social/@undivaga/108227636773516175

There are many more but I thought I'd stop at 6. I suspect some of these fall foul of the no spaces between words that you mentioned above.

pauby avatar May 01 '22 16:05 pauby

As of #17478 language detection is no more, your posting language is defined by the interface language. I might assume these people use English language, so that's why you see these posts while using language filter.

brawaru avatar May 01 '22 16:05 brawaru

If that's the case, shouldn't this issue be closed? Appreciate you may only be joining the dots on this now.

Two questions, if I can:

  • Is there anywhere that documents how this feature is supposed to work? I couldn't find anything.
  • Is there any work going into making this better as 'we default to interface language' is a technical solution to 'the thing we had wasn't working' but not a solution to the problem of timelines being full of 'unreadable' messages. For me it's about 1 in 3 messages on average being unreadable. As a user, that's not sustainable for me to keep muting them.

Thanks.

pauby avatar May 01 '22 17:05 pauby

Well, it filters posts by language, this is all it does and it kinda works, it's just that the post languages are not correct. As federated feed appears to be deprecated and probably going to be removed in the future, I guess this feature is slowly losing its purpose and probably gonna be gone too. However, discovery features like Explore tab heavily rely on languages, so I think the solution to incorrect languages could be #17073.

brawaru avatar May 01 '22 19:05 brawaru

I have the same problem. Post https://mstdn.jp/@neeko/109257552636605699 is one example. There is no English in the post, and Neeko's profile (https://mstdn.jp/@neeko) doesn't appear to mention English, at least not in English.

gdinwiddie avatar Oct 30 '22 15:10 gdinwiddie

@gdinwiddie This server is using 3.5.1 which already has language detection removed. The post is published in English language: you can check that in raw JSON representation, through contentMap field. I suppose because in their settings they have posting language set to English. In newer versions (3.5.3) language dropdown was added allowing to select language per post.

brawaru avatar Oct 31 '22 01:10 brawaru

I'm seeing this too. Examples: https://pl22.telteltel.com/notice/APl5eVKGAX1OjmltRo https://mstdn.jp/@Kisaragineru/109369962060280233 https://iyasaretai.pw/@Ruhuna/109369959874521994 https://masto.nu/@amralomari/109369961726576315 https://mstdn.jp/@Kisaragineru/109369956531667912 https://mstdn.jp/@Kisaragineru/109369956033343951 https://mstdn.jp/@Xider/109369949789030511

ghost avatar Nov 19 '22 10:11 ghost

I'm seeing this too. Examples: https://pl22.telteltel.com/notice/APl5eVKGAX1OjmltRo https://mstdn.jp/@Kisaragineru/109369962060280233 https://iyasaretai.pw/@Ruhuna/109369959874521994 https://masto.nu/@amralomari/109369961726576315 https://mstdn.jp/@Kisaragineru/109369956531667912 https://mstdn.jp/@Kisaragineru/109369956033343951 https://mstdn.jp/@Xider/109369949789030511

It seems these are tagged "contentMap":{"en": https://iyasaretai.pw/@Ruhuna/109369959874521994.json

Not everybody use the lang tag feature.

ronilaukkarinen avatar Nov 25 '22 19:11 ronilaukkarinen

Looking at the JSON response for a (federated) status, I see "language": "es", but in my profile settings I only have selected: de, en, eo, and fr.

cmeury avatar Dec 11 '22 11:12 cmeury

Still today, the language filter is not working. Even though I have select only english, I still see toots in different languages in my feed (and yes, I've checked if those toots have set a different language than english)

mAcf00bar avatar Jun 12 '23 18:06 mAcf00bar

It's still broken and confirmed by this issue still being open.

It really does affect my browsing, especially as I'm on a larger instance (mastodon.social). I'm assuming we're in a minority or the team would be working on it.

I'm considering moving away from this instance purely because of this issue. It won't fix it, but with less non-English posts I'll mitigate it somewhat.

pauby avatar Jun 12 '23 19:06 pauby

Definitely still broken, and definitely still turning off users. I have my profile set to English only and here are some excerpts from my home timeline.

Screenshot 2023-07-02 at 15 37 33 Screenshot 2023-07-02 at 15 37 52 Screenshot 2023-07-02 at 15 38 36

drpaulralph avatar Jul 02 '23 18:07 drpaulralph

Mastodon has removed automatic language detection for posts, in above cases the posters just didn't set their posts languages correctly.

brawaru avatar Jul 02 '23 19:07 brawaru

How does one check what language a post is using? This issue is definitely a problem for me but I don't know if it's a bug or people just aren't setting their post language correctly.

brendanjones avatar Jul 14 '23 16:07 brendanjones

You can check the post language:

  • By simply pressing the reply button. Usually, this sets your post language to the language of the post you're replying to.

  • Using the Inspect Element (F12 / Ctrl+Shift+I) and finding the element for the post, then looking at the lang attribute.

  • Copying the link to the original post and adding .json at the end to see the raw ActivityPub JSON representation. In the contentMap, you will find the post content mapped to a specific language tag. For example, in this tagesschau post, it is marked as German, as indicated by the "de" key in the "contentMap".

brawaru avatar Jul 14 '23 22:07 brawaru

Thanks @brawaru. Righto, my language filter is working for federated timeline. I can't test it for local timeline because my instance is English-only.

The filter doesn't work for my home feed, I get other languages there (confirmed by looking at the json of posts). Which is not a technical bug, since the setting is to filter languages in 'public timelines', and home is not a public timeline.

I'd call it an unexpected-behaviour bug, though. If I set a language filter I'd expect it to filter Home timeline as well.

brendanjones avatar Jul 14 '23 23:07 brendanjones

Update: is anyone seeing posts in the home feed that don't follow their language filters, and aren't from boosts https://github.com/mastodon/mastodon/issues/20241 or followed hashtags https://github.com/mastodon/mastodon/issues/20937?

Because I've realised that all the posts I can find in my home feed that don't respect my language filters are because of those two issues. Said another way: I'm not seeing any posts from people I follow that don't respect my language filters. Only posts from boosts and followed hashtags.

brendanjones avatar Jul 15 '23 15:07 brendanjones

Mastodon has removed automatic language detection for posts, in above cases the posters just didn't set their posts languages correctly.

I still have posts in my timeline where users have languages set to something other than English. Here's an example:

Screenshot 2023-07-16 at 14 36 19

Yes I know this is a boost but what difference does that make? If someone says "only show me posts in languages X Y and Z" then that should also apply to boosts.

drpaulralph avatar Jul 16 '23 17:07 drpaulralph

Yes I know this is a boost but what difference does that make?

@drpaulralph Indeed, it doesn’t make a difference from a user perspective, everything in home feed should respect language filters. It is however useful to know specifically which type of posts to which to the issue applies, in order to solve it easier.

And also to have a cleaner backlog; if the issue only applies to posts from boosts and tags then this ticket can be closed in favour of the other two I mentioned above (or close those two and keep this one, whichever).

brendanjones avatar Jul 16 '23 19:07 brendanjones

@brendanjones Fair point. All of the posts that have bypassed the language filter on my home feed in the past few days were boosts.

drpaulralph avatar Jul 18 '23 11:07 drpaulralph