vespa
vespa copied to clipboard
Support query API property aliases in query profiles
Describe the bug Rank-profile inputs are not set via query-profile.
To Reproduce Steps to reproduce the behavior:
- Used sample application
schema doc {
document doc {
field subject type string {
indexing: summary | attribute | index
index: enable-bm25
attribute {
fast-search
}
}
field body type array<string> {
indexing: summary | attribute | index
index: enable-bm25
attribute {
fast-search
}
}
}
field body_embedding type tensor<bfloat16>(p{},x[1024]) {
indexing: input body | embed e5 | attribute | index
attribute {
distance-metric: angular
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 200
}
}
}
fieldset default {
fields: subject, body
}
rank-profile my_rank_profile {
inputs {
query(q) tensor<bfloat16>(x[1024])
query(subjectWeight) : 3
}
function weighted_subject() {
expression {
nativeRank(subject) * query(subjectWeight)
}
}
first-phase {
expression {
cos(distance(field,body_embedding)) + weighted_subject
}
}
match-features {
query(subjectWeight)
weighted_subject
firstPhase
}
}
}
Embedder in services.xml
<!-- See https://docs.vespa.ai/en/embedding.html#huggingface-embedder -->
<component id="e5" type="hugging-face-embedder">
<transformer-model url="https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx"/>
<tokenizer-model url="https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json"/>
<prepend> <!-- E5 prompt instructions -->
<query>query:</query>
<document>passage:</document>
</prepend>
</component>
- Create QueryProfile
Query-profile type:
<query-profile-type id="NearestNeighborTestTypes">
<field name="yql" type="string"/>
<field name="nn-input" type="string"/>
<field name="input.query(q)" type="tensor(x[1024])"/>
<field name="input.query(subjectWeight)" type="float"/>
<field name="ranking.profile" type="string"/>
</query-profile-type>
Query-profile:
<query-profile id="NearestNeighborTestProfile" type="NearestNeighborTypes">
<field name="yql">select * from sources * where %{.nn-input} or userQuery()</field>
<field name="nn-input">({targetHits:10}nearestNeighbor(body_embedding,q))</field>
<field name="input.query(q)">embed(e5,@query)</field>
<field name="input.query(subjectWeight)">5</field>
<field name="ranking.profile">my_rank_profile</field>
</query-profile>
- Deploy Schema and see error
Error: invalid application package (status 400)
Invalid application:
Error reading query profile 'NearestNeighborTestProfile' of type 'NearestNeighborTestTypes':
Could not set 'input.query(q)' to 'embed(e5,@query)':
Can't find embedder 'e5'. Available embedder ids are 'default'.
- Move the
embed(e5,@query) into the search request and deploy the schema
<query-profile id="NearestNeighborTestProfile" type="NearestNeighborTypes">
<field name="yql">select * from sources * where %{.nn-input} or userQuery()</field>
<field name="nn-input">({targetHits:10}nearestNeighbor(body_embedding,q))</field>
<field name="input.query(subjectWeight)">5</field>
<field name="ranking.profile">my_rank_profile</field>
</query-profile>
- Query via search request
{
"query": "Some",
"queryProfile": "NearestNeighborTestProfile",
"input.query(q)": "embed(e5,@query)"
}
- See that the subjectWeight was not overwritten to 5 by the queryProfile:
{
"root": {
"id": "toplevel",
"relevance": 1.0,
"fields": {
"totalCount": 1
},
"coverage": {
"coverage": 100,
"documents": 1,
"full": true,
"nodes": 1,
"results": 1,
"resultsFull": 1
},
"children": [
{
"id": "id:default:doc:g=testing:test::1",
"relevance": 2.057677356674378,
"source": "text",
"fields": {
"matchfeatures": {
"firstPhase": 2.057677356674378,
"query(subjectWeight)": 3.0,
"weighted_subject": 1.1455871507985373
},
"sddocname": "doc",
"documentid": "id:default:doc:g=testing:test::1",
"subject": "Some Subject",
"body": [
"Some body"
]
}
}
]
}
}
Expected behavior It is possible to set the ranking embedder via QueryProfile. Setting rankProfile inputs through QueryProfile alters the preset in the rankProfile, observable in the matchfeatures.
Environment (please complete the following information): Dockerized Vespa
Vespa version Vespa version: 8.475.11
Query profiles must use the full name, ranking.features, aliases like input are only supported in queries.
The assumption behind this is that brevity matters more in requests and structure more in query profiles, but even though it is mentioned in the doc I think it is too easy to miss, so we should probably add alias support in query profiles as well. Let's use this issue for that.
I changed and tested it and it works, thank you. I would appreciate alias support.