jekyll-tagging-related_posts icon indicating copy to clipboard operation
jekyll-tagging-related_posts copied to clipboard

Request to change how relevance is calculated

Open smi11 opened this issue 6 years ago • 0 comments

Please, consider the following example:

tags count match score %
my-post cat dog 2
post-1 ant bee cat cow 4 1 0,1250 12,5%
post-2 ant bee cat cow dog eel fox goat 8 2 0,2500 25,0%
post-3 cat 1 1 0,5000 50,0%
post-4 dog 1 1 0,5000 50,0%
post-5 bee fox 2 0 0,0000 0,0%
post-6 cat dog 2 2 1,0000 100,0%
post-7 0 0 0,0000 0,0%
post-8 cat cow 2 1 0,2500 25,0%
post-9 ant cat dog 3 2 0,6667 66,7%
post-10 ant bee cat cow eel fox goat 7 1 0,0714 7,1%
post-11 ant bee cat dog eel fox goat 7 2 0,2857 28,6%
post-12 cow 1 0 0,0000 0,0%

I want to calculate relevance (score) for my-post with 2 tags: cat dog

So, what I can get very easily and what your plugin is already doing is:

post.count = number of tags for that post post.match = number of matching tags with my-post (that is your current score, if I'm not mistaking)

I would like to make score more relevant by adding some basic calculation to increase relevance accuracy. At the moment you're using simply number of matching tags, which might not be as accurate and relevant if the fraction of matching tags is much lower than the total number of tags for that post.

Consider post-10 and post-3 from the table above. By using only number of matching tags both those examples are equally relevant with the same number of matching tags. However in practice that is not true. post-3 is much more relevant as it has exactly 1 matching tag with my-post. While post-10 has 7 tags and only 1 tag matches my-post. So obviously that post should be less relevant.

With my calculation post-3 has a score of 50% or 0.5 which is higher and more relevant than post-10 with a score of 7.1% or 0.0714.

The calculation is very simple:

if my-post.count is 0    // avoid division by 0
     there are no relevant posts so set all scores to 0 and exit
endif

for each post do
    if post.count is 0   // avoid division by 0
        score = 0
    else 
        score = ( post.match / post.count ) * ( post.match / my-post.count )
    endif
endfor

What you get is a score which should be type float and between 0 and 1, with 0 no relevance and 1 exact match and most relevant. To change score to percentage just multiply it by 100.

Please consider modifying your excellent plugin as I would really like to have more relevant tags for my project. I would fork and modify your plugin but I don't "speak" ruby, so I have no idea how to implement this myself. Thanks.

smi11 avatar Oct 24 '18 09:10 smi11