CoAnSys icon indicating copy to clipboard operation
CoAnSys copied to clipboard

Remove document similarity alternative ranking script

Open marekhorst opened this issue 8 years ago • 0 comments

Some time ago an alternative approach to ranking operation was introduced:

https://github.com/CeON/CoAnSys/blob/298863befc2f0e3a96b25a9ee53f6b53b41090a6/document-similarity/document-similarity-logic/src/main/pig/document-similarity-s1-ship-rank_filter.pig

involving custom rank operation written in rank.py script introduced in 318d88ce7509b366c5428ac22135bc05421ad088 commit.

An alternative oozie execution path could be selected by enabling load_filterTerms_calcTfidf_filter_ship_ranked flag.

This was a solution to memory related issues related to PIG embedded rank operation. In fact this may have been caused by the very same reason as the one causing #425.

The thing is as soon as #425 is fixed and PIG embedded rank operator works properly we can get rid of this alternative path.

It is useless anyway because it causes failure at later docsim stage. Probably both ranking related PIG scripts diverged at some point and an alternative one is not fully compliant with main one.

marekhorst avatar Jul 19 '17 15:07 marekhorst