eland
eland copied to clipboard
Add a way to access _score from DataFrame when using scoring filters
Relates to #282 it'd be nice to be able to access the _score value (and sort by it too). Need to find out how we should expose the _score information to users. My first thought was to include it as a "psuedo-scripted" field that has type float64:
df = ed.DataFrame(es, "nyc-restaurants")
print(df.es_match("blue").filter(["name", "_score"]))
name _score
ZckkjnQBvi72UTXObqxX BLUE HAVEN EAST 5.523277
JckkjnQBvi72UTXOb60U BLUE BAY RESTAURANT 5.523277
68kkjnQBvi72UTXOb60V RIAZOR BLUE TAPAS BAR 4.813509
ackkjnQBvi72UTXOb64V BLUE CAFE RESTAURANT & BAR 4.813509
BMkkjnQBvi72UTXOcLI8 BLUE SKY RESTAURANT CAFE 4.813509
... ... ...
A8wljnQBvi72UTXOrpgP BLUE BOTTLE COFFEE 5.523277
LswljnQBvi72UTXOrZdc BLUE SMOKE 6.478565
QcwljnQBvi72UTXOrJWW BLUE RUIN 6.478565
XswljnQBvi72UTXOrZZc BLUE BOTTLE COFFEE 5.523277
jswljnQBvi72UTXOq5K0 THE BLUE STOVE 5.523277
[556 rows x 2 columns]
Should all Eland DataFrames have this _score column by default with NaN values when there's no scoring happening? Or maybe we only add the column when using a scoring filter like es_match() and we do so automatically? Would love thoughts here.
@sethmlarson - I think adding this only to the return from es_match could be appropriate. However, I don't think this is required for GA.