ranking icon indicating copy to clipboard operation
ranking copied to clipboard

How to do feature extraction for custom raw data

Open sagarkar10 opened this issue 2 years ago • 1 comments

My main aim is to do a feature extraction (and down the line train models inside tf-ranking itself). I have some data in csv with looks something like

query,title,id,price,description
iphone,iphone 12,M101,799,apple iphone with lots of features
mobile,samsung s21,M211,599,not iphone with lot more features

So I want to derive my feature sets where some are query dependent, some are not. Let's assume the feature set is

f_query_doc_1 = some_scoring_func(title, query)
f_query_doc_2 = some_scoring_func(description, query)

f_query_only = len(query)

f_document_only = len(description)
....

I want to extract the features and calculate some more complex feature using w-models like bm25

To give a complete scope of my feature requirements, you can refer: https://github.com/ten-blue-links/fxt/blob/master/doc/features.md

A simple snippet tutorial or link to the same would be very much appreciated.

sagarkar10 avatar Jul 17 '21 15:07 sagarkar10

@rjagerman

sagarkar10 avatar Jul 27 '21 15:07 sagarkar10