template-scala-parallel-universal-recommendation
template-scala-parallel-universal-recommendation copied to clipboard
Fix paramsBoostedFields and paramsFilterFields
Description: According to documentation: http://actionml.com/docs/ur_config fields: optional, default = none. An array of ... . Positive biases are boosts any negative number will filter out any results that do not contain the values in the field name.
Problem: When adding fields to algorithm parameters with a negative bias (-1) they will not be filtered, but instead added as boosted fields with a negative value. Providing a positive number will cause this to be a filter.
If you look at the annotations, someone has tried to fix this but instead of swapping the ">" and "<", he moved "=" instead.
Solution: I have corrected this issue so that paramsFields are consistent with queryFields. (see queryBoostedFields below)
Do you have a test to prove correct behavior? See examples/integration-test for how to do this. It should be easy, to create and add more tests. Also I believe the test is already included.
The "=" is because 0 is reserved for a new meaning in the UR v0.6.0, namely as an exclusion filter.
Bias in v 0.6.0:
- 0: exclude items with properties specified (must_not in ES parlance)
- <0: include only items with properties specified (must in ES parlance)
- >0: use the bias as a multiplier on all scores
The next release will be timed to coincide with PIO 0.11.0 (being voted on) and Mahout v0.113.0 (being assembled for release). Watch for a new branch labeled v0.6.0-SNAPSHOT.
Good point, I couldn't find any tests where algorithm params include the fields attribute. I will try to create a new test to cover it.
it is very difficult to test params in engine.json in an automated way. So this is a case of fields in engine.json?
That helps to know since they have had minimal use or testing, except the time stamp filters. Putting a field in engine.json means it will be applied to every query, which is not very useful. Why include data for the items if they are never needed as recs? I was thinking of deprecating that but if you need it, let's make sure it works as advertized.
I'd be happy for a very simple test run on it's own not integrated with examples/integration-test
. You may see another one regarding rankings there too, with it's own data.
I don't use the fields parameter currently. I'm using the template and was reviewing your code for a better understanding of the algorithm and its opportunities. This just caught my eye.
One use case I can imagine for the fields parameter is setting default boosts and filters for items. Since query parameters override this, it would make sense to describe common filters here too.
However this could cause issues on the application side as the results are silently boosted and filtered which may not always be a good practice.
Fox example in my case (movie recommendations) I would populate a field isAdult = yes/no (or perhaps K-rating) to each item. By default I want to exclude all adult content from recommendations, unless I explicitly say so in the query (adult recommendations feed). Currenly I have to add isAdult = [no] to each query (which is not a problem).
Another case is when I want to set default boosts to item attributes (ie title boost 1.5, description 1, cast 1.2, etc). What do you think where these should be described? (application for each query, algo params fields, somewhere else).
Ofcourse this would be a separate test with special data. Test cases:
- Test with a simple query - must not return adult content
- Test with query isAdult = [yes] - must return only adult content
- Test with query isAdult = [yes,no] - must return all content (perhaps a better way to include all?)
I think there might be a broken case if you add a fields with negative bias (filter) and in the query add it as a boost. In this case the query boost would not override fields filter as they're added in separate sections of the ES query (should, must_not).
I see 2 options: a) I'm not sure if people use this feature if it's broken. Remove it without deprecation notice. b) Add tests, solve the filter/boost issue, merge in working solution if there is a useful use case