aldryn-search
aldryn-search copied to clipboard
Language detection fails
Hi,
I am facing an issue when creating blog posts / publishing pages from the page listing in another language. The basic problem is, that get_language()
returns the wrong value e.g. for this link
http://localhost:8000/en/admin/cms/page/13/de/publish/?redirect_language=en&redirect_page_id=5
. The same applies for blog posts.
To clarify / how to reproduce:
- setup the cms and with two languages
- create a page in the primary language
- create a translation via the page listing (clicking the circle button) and stay in the primary language
- publish the translation via the page listing (hover over the circle button and click "publish" in the tooltip)
Expected:
- an added entry in the index for the page in the secondary language
Actual:
- an added entry in the primary language
I tried to workaround that by inspecting the instance in a custom implementation of the router, unfortunately there is not unified way to do so. The Title model for instance has a language
attribute whilst an Article has a language_code
. The only solution for now that comes to my mind would be to store all entries in one index and query with a language filter...
Any recommendations / help regarding this are greatly appreciated.
Regards
Jakob
Hello Jakob,
Just to make sure I understand, in your example above you expect the de
title to be indexed but instead the en
one is the one that gets indexed? Also this only happens only on actions like publish or save() meaning only through the haystack signal handlers and not the manual rebuilding of the index?
EDIT: I expect the de
title to be indexed on the de
haystack connection, instead it is written to the en
connection. The contents are correct however.
EDIT 2: This only happens when publishing. I should read your post, before answering...
Hi,
this is exactly the case. The same applies for a blog post when going to the blog page in the primary langue, lets say en
, you go to Add new article...
via the menu, then switch to the secondary language tab, e.g. de
and then create the blog post. The language router fails to detect that the blog post was created in German and not in English. I think there actually is no easy solution to this, at least none I could think of. Maybe one could leverage the get_language(object)
method on the Index somehow, but I am not sure if the router can / should access the index classes...
Thanks for your quick reply
Jakob
So yes this is a problem I'm quite familiar with.
And yes the only solution to this is to leverage get_language()
method because it's the only method that knows how to fetch the "active" language for an object.
In the past, I've defined the should_update
on my index base class and let it return False
to prevent "realtime indexing" as a workaround for this issue.
As far as a real solution comes, I've pushed a potential fix to the fixes/let-object-decide-alias branch. It's something I've experimented with in the past but I won't push it to master today since I'd like to review it carefully and make sure it has no crazy side-effects :).
Please have a look at the branch and let me know if it works for you, I did a quick test locally and it worked.
Thank you, I will check it out!
I haven't tested the code, yet but from the looks this is what I imagined. However the line
unified_index = connections[alias].get_unified_index()
https://github.com/aldryn/aldryn-search/blob/fixes/let-object-decide-alias/aldryn_search/router.py#L32 looks suspicious to me, since we are trying to find the index of a potentially wrong connection (the alias
might differ from the instance language, which is the root cause of this issue). I am no expert on haystack though and am not sure if this could lead to errors. Is there a case where one would to exclude an index in one language but not in another? Maybe a more failsafe way would be iterating over all available connections / aliases if not found with the current alias. I have no clue about the performance impact of such an implementation and whether caching the index would be a viable option or not. It's not pretty either...
What dou you think?
Regards
Jakob
Your suspicion is accurate, haystack allows you to exclude indexes from certain connections and so it is possible that we fail to get an index for a connection but in my opinion it's better than getting the language wrong every time. I think a better solution would require a change in the way haystack triggers the object indexing (signal). Iterating over all connections is not a bad idea but I'm not really a fan of brute forcing it but it might be the way to go until a proper fix can be integrated in Haystack.
I agree that the code is not very pretty because it exposes an internal logic in Haystack, but this is actually found everywhere there :)
Regarding a potential for performance impact, it really depends. Currently there's none.
Accessing a connection is just accessing a dictionary and never touching the search engine, after that we call the get_language()
method which simply returns a language for the given instance.
This method is defined by both the TitleIndex
and ArticleIndex
, neither of the two make any extra queries or requests to get the language from an instance.
That said, if you add some queries in the get_language()
method then yes it would result in a performance impact because it would be called once for the router and another one for the indexing.
FYI, works for my use cases so far.
Thank you very much
Jakob
Hey @czpython,
what happened to the branch if I may politely inquire? Is there any way this will make it into a release? Realtime indexing is seriously broken without this fix...
Thank you very much
Jakob
Hey @jakob-o,
I meant to post an update here.. I closed the pr https://github.com/aldryn/aldryn-search/pull/49#issuecomment-170371362 and posted an explanation as to why I decided it's not the way to go.
I have a fix for this issue on cms pages which I will push sometime this week.
As far as other addons like Aldryn Newsblog go, I'll have to think of another solution :cry:
EDIT: OK forget that we could use the function that I mentioned, but would there be a possibility to detect the publishing language another way?
Hey @czpython,
how about a middleware that stores the result from get_language_from_request(request, check_path=True)
in a thread local and use this in the language router if available?
Regards
Jakob
Another possibility would be a registry, that allows language lookup based on the instance. Something like:
def get_instance_language(instance):
try:
lookup = language_lookups[instance.__class__]
except KeyError:
return get_language()
else:
return lookup.get_language(instance)
@jakob-o Thanks for the suggestions. I've decided to use a more explicit approach for the addons by creating a base index in https://github.com/aldryn/aldryn-translation-tools that will then provide some functionality to index the correct language for a parler object.
I will be using the add_to_index
signal and make sure to ignore haystack's default post_save signal handler as it's too generic.
EDIT: I just realized the code changes in this repo were only for the unpublishing issue... My question remains though.
EDIT2: Created a pull request https://github.com/aldryn/aldryn-search/pull/61
Hey @czpython,
I had a look at the updated code and am not sure how this would solve the language detection problem. I am fine with it for now however (using a custom router implementing the registry approach). Two small things though:
- I think the
action
argument should be added in the signal'sproviding_args
- Would you mind refactoring the
action
argument tocms_action
or something more distinguishable?
The latter one would spare me a lot of trouble since I am using a custom implementation based on aldryn_search and https://github.com/django-haystack/celery-haystack which is adding an action
argument too which causes clashes... sigh
Thank you very much!
Jakob
Hello @jakob-o,
I've not gotten around to implement the language fix yet..
Regarding your points:
- I agree :)
- I like the name
action
but can rename touser_action
,cms_action
is too specific to cms. The idea behind this argument is to be leveraged by other indexes to conditionally update the index like it's being used by theTitleIndex
.
How about index_action
? I will happily update my PR, you decide the name.