vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Invalid package when indexing expression `get_field` is used in a search definition

Open aguereca opened this issue 7 years ago • 4 comments

Vespa version: 6.288.1 Steps to reproduce:

  • Use get_field as indexing expression in a field's indexing statement (view sample search definition)
  • Upload application package to a Vespa cluster (Local Docker image)

Expected result:

  • Package is uploaded and indexing expression is used on new feed documents Actual result:
  • Error received when validating package:
{
  "error-code": "INVALID_APPLICATION_PACKAGE",
  "message": "Invalid application package: default.my-cluster: Error loading model: For search 'sandbox', field 'one_id': For expression 'get_field address_id': Field 'address_id' not found."
}

Notes: I've looked at the source code of this feature and reviewed the test cases to make sure I'm using the expression as intended: https://github.com/vespa-engine/vespa/blob/vespa-6.288.1-1/indexinglanguage/src/test/java/com/yahoo/vespa/indexinglanguage/expressions/GetFieldTestCase.java Unless I'm missing something, I believe there is an issue when using this expression, any help/tips on this will be appreciated.

Sample SearchDefinition:

search sandbox {
    document sandbox {
        field some_name type string {
            indexing: summary | index
        }
        struct struct_address {
            field address_id type string {}
            field coordinates type position {}
        }

        field one_address type struct_address {}

        field all_addressses type array<struct_address> {}
    }

    field one_id type string {
      indexing: input one_address | get_field address_id | summary | attribute
    }
}

aguereca avatar Sep 26 '18 17:09 aguereca

Thanks for the clear description. Assigning to the right person.

Note that you can skip the get_field instruction and just to

input one_adress.address_id | summary | attribute

bratseth avatar Sep 27 '18 00:09 bratseth

Thanks for you reply, I tried your suggestion but it still fails, now with this error:

{
  "error-code": "INVALID_APPLICATION_PACKAGE",
  "message": "Invalid application package: default.my-cluster: Error loading model: Could not parse search definition file 'searchdefinitions/sandbox.sd': Error reported by IL parser: Encountered \" <IDENTIFIER> \"address_id \"\" at line 17, column 35.\nWas expecting one of:\n    <INTEGER> ...\n    <LONG> ...\n    <DOUBLE> ...\n    <FLOAT> ...\n    \"+\" ...\n    \"-\" ...\n    \"{\" ...\n    \"(\" ...\n    <STRING> ...\n    \"attribute\" ...\n    \"base64decode\" ...\n    \"base64encode\" ...\n    \"clear_state\" ...\n    \"echo\" ...\n    \"exact\" ...\n    \"flatten\" ...\n    \"for_each\" ...\n    \"get_field\" ...\n    \"get_var\" ...\n    \"guard\" ...\n    \"hexdecode\" ...\n    \"hexencode\" ...\n    \"hostname\" ...\n    \"if\" ...\n    \"index\" ...\n    \"input\" ...\n    \"join\" ...\n    \"lowercase\" ...\n    \"ngram\" ...\n    \"normalize\" ...\n    \"now\" ...\n    \"optimize_predicate\" ...\n    \"passthrough\" ...\n    \"random\" ...\n    \"select_input\" ...\n    \"set_language\" ...\n    \"set_var\" ...\n    \"split\" ...\n    \"substring\" ...\n    \"summary\" ...\n    \"switch\" ...\n    \"this\" ...\n    \"tokenize\" ...\n    \"to_array\" ...\n    \"to_byte\" ...\n    \"to_double\" ...\n    \"to_float\" ...\n    \"to_int\" ...\n    \"to_long\" ...\n    \"to_pos\" ...\n    \"to_string\" ...\n    \"to_wset\" ...\n    \"trim\" ...\n    \"zcurve\" ...\n    \nAt position:\n      indexing: input one_address.address_id | summary | attribute\n                                  ^: Encountered \" <IDENTIFIER> \"address_id \"\" at line 17, column 35.\nWas expecting one of:\n    <INTEGER> ...\n    <LONG> ...\n    <DOUBLE> ...\n    <FLOAT> ...\n    \"+\" ...\n    \"-\" ...\n    \"{\" ...\n    \"(\" ...\n    <STRING> ...\n    \"attribute\" ...\n    \"base64decode\" ...\n    \"base64encode\" ...\n    \"clear_state\" ...\n    \"echo\" ...\n    \"exact\" ...\n    \"flatten\" ...\n    \"for_each\" ...\n    \"get_field\" ...\n    \"get_var\" ...\n    \"guard\" ...\n    \"hexdecode\" ...\n    \"hexencode\" ...\n    \"hostname\" ...\n    \"if\" ...\n    \"index\" ...\n    \"input\" ...\n    \"join\" ...\n    \"lowercase\" ...\n    \"ngram\" ...\n    \"normalize\" ...\n    \"now\" ...\n    \"optimize_predicate\" ...\n    \"passthrough\" ...\n    \"random\" ...\n    \"select_input\" ...\n    \"set_language\" ...\n    \"set_var\" ...\n    \"split\" ...\n    \"substring\" ...\n    \"summary\" ...\n    \"switch\" ...\n    \"this\" ...\n    \"tokenize\" ...\n    \"to_array\" ...\n    \"to_byte\" ...\n    \"to_double\" ...\n    \"to_float\" ...\n    \"to_int\" ...\n    \"to_long\" ...\n    \"to_pos\" ...\n    \"to_string\" ...\n    \"to_wset\" ...\n    \"trim\" ...\n    \"zcurve\" ...\n    \nAt position:\n      indexing: input one_address.address_id | summary | attribute\n                                  ^: Error reported by IL parser: Encountered \" <IDENTIFIER> \"address_id \"\" at line 17, column 35.\nWas expecting one of:\n    <INTEGER> ...\n    <LONG> ...\n    <DOUBLE> ...\n    <FLOAT> ...\n    \"+\" ...\n    \"-\" ...\n    \"{\" ...\n    \"(\" ...\n    <STRING> ...\n    \"attribute\" ...\n    \"base64decode\" ...\n    \"base64encode\" ...\n    \"clear_state\" ...\n    \"echo\" ...\n    \"exact\" ...\n    \"flatten\" ...\n    \"for_each\" ...\n    \"get_field\" ...\n    \"get_var\" ...\n    \"guard\" ...\n    \"hexdecode\" ...\n    \"hexencode\" ...\n    \"hostname\" ...\n    \"if\" ...\n    \"index\" ...\n    \"input\" ...\n    \"join\" ...\n    \"lowercase\" ...\n    \"ngram\" ...\n    \"normalize\" ...\n    \"now\" ...\n    \"optimize_predicate\" ...\n    \"passthrough\" ...\n    \"random\" ...\n    \"select_input\" ...\n    \"set_language\" ...\n    \"set_var\" ...\n    \"split\" ...\n    \"substring\" ...\n    \"summary\" ...\n    \"switch\" ...\n    \"this\" ...\n    \"tokenize\" ...\n    \"to_array\" ...\n    \"to_byte\" ...\n    \"to_double\" ...\n    \"to_float\" ...\n    \"to_int\" ...\n    \"to_long\" ...\n    \"to_pos\" ...\n    \"to_string\" ...\n    \"to_wset\" ...\n    \"trim\" ...\n    \"zcurve\" ...\n    \nAt position:\n      indexing: input one_address.address_id | summary | attribute\n                                  ^: Encountered \" <IDENTIFIER> \"address_id \"\" at line 17, column 35.\nWas expecting one of:\n    <INTEGER> ...\n    <LONG> ...\n    <DOUBLE> ...\n    <FLOAT> ...\n    \"+\" ...\n    \"-\" ...\n    \"{\" ...\n    \"(\" ...\n    <STRING> ...\n    \"attribute\" ...\n    \"base64decode\" ...\n    \"base64encode\" ...\n    \"clear_state\" ...\n    \"echo\" ...\n    \"exact\" ...\n    \"flatten\" ...\n    \"for_each\" ...\n    \"get_field\" ...\n    \"get_var\" ...\n    \"guard\" ...\n    \"hexdecode\" ...\n    \"hexencode\" ...\n    \"hostname\" ...\n    \"if\" ...\n    \"index\" ...\n    \"input\" ...\n    \"join\" ...\n    \"lowercase\" ...\n    \"ngram\" ...\n    \"normalize\" ...\n    \"now\" ...\n    \"optimize_predicate\" ...\n    \"passthrough\" ...\n    \"random\" ...\n    \"select_input\" ...\n    \"set_language\" ...\n    \"set_var\" ...\n    \"split\" ...\n    \"substring\" ...\n    \"summary\" ...\n    \"switch\" ...\n    \"this\" ...\n    \"tokenize\" ...\n    \"to_array\" ...\n    \"to_byte\" ...\n    \"to_double\" ...\n    \"to_float\" ...\n    \"to_int\" ...\n    \"to_long\" ...\n    \"to_pos\" ...\n    \"to_string\" ...\n    \"to_wset\" ...\n    \"trim\" ...\n    \"zcurve\" ...\n    \nAt position:\n      indexing: input one_address.address_id | summary | attribute\n                                  ^"
}

aguereca avatar Sep 28 '18 19:09 aguereca

Reply to self, I managed to make it work by using quotes:

indexing: input one_address."address_id" | summary | attribute

aguereca avatar Sep 28 '18 20:09 aguereca

Reply to my reply: That didn't worked, seems like it did but in reality was just indexing the .to_string version of the Struct, and concatenating the field name. Back to square 1.

aguereca avatar Oct 04 '18 22:10 aguereca