promptflow icon indicating copy to clipboard operation
promptflow copied to clipboard

[BUG] [VSCode Extension] - promptflow-vectordb mlindex_content fails

Open graemefoster opened this issue 1 year ago • 10 comments

Describe the bug Running on VSCode DevContainer using latest Prompt-Flow extension I cannot get the Index Lookup tool from the promptflow-vectordb package to work.

When I try and configure the index in the UI I get: {retrieve-func-result} ... pfutil tool: error: argument {retrieve-func-result}: invalid choice: 'dynamic_list' (choose from 'retrieve-func-result')

And in the output see this: "pfutil tool: error: argument {retrieve-func-result}: invalid choice: 'dynamic_list' (choose from 'retrieve-func-result')"

When I setup the tool in YAML and run the prompt I get errors about a missing env variable - "BUILD_INFO", which I can see exists when running the Prompt in Azure AI Studio.

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:

  1. Setup VS Code local development with latest Prompt Flow
  2. Install promptflow-vectordb package
  3. Add an Index lookup task
  4. Click the mlindex_content text box

Screenshots

  1. On the VSCode primary side bar > the Prompt flow pane > quick access section. Find the "install dependencies" action. Please it and attach the screenshots there. Screenshot 2024-08-08 at 12 42 29 PM

  2. Please provide other snapshots about the key steps to repro the issue.

Environment Information pf -v { "promptflow": "1.14.0", "promptflow-azure": "1.14.0", "promptflow-core": "1.14.0", "promptflow-devkit": "1.14.0", "promptflow-tracing": "1.14.0" }

  • Operating System: [e.g. Ubuntu 20.04, Windows 11] VS Code Python DevContainer mcr.microsoft.com/vscode/devcontainers/python:3.10

  • Python Version using python --version: [e.g. python==3.10.12] Python (Linux) 3.10.14 (main, Jul 10 2024, 19:05:05) [GCC 12.2.0]

  • VS Code version. 1.92.0

  • Prompt Flow extension version.

  • Tried both latest and preview (v1.21.133218292 (pre-release))

  • On the VS Code bottom pane > Output pivot > "prompt flow" channel, find the error message could be relevant to the issue and past them here. That would be helpful for our trouble shooting. [error] Error: Command failed: /usr/local/bin/python /home/vscode/.vscode-server/extensions/prompt-flow.prompt-flow-1.21.133218292/pfutil/pfutil.py tool -c retrieve-func-result --func-call-scenario dynamic_list --input-base64 -o /tmp/pf-retrieveFuncResult-4da9eb17-0811-4c85-85cc-50ae718abc07.json

usage: pfutil tool [-h] [--list] [--file FILE] [--working-dir WORKING_DIR] [--type TYPE] [--output OUTPUT] [--mode {append,output}] [--config CONFIG] {retrieve-func-result} ... pfutil tool: error: argument {retrieve-func-result}: invalid choice: 'dynamic_list' (choose from 'retrieve-func-result')

  • If your code to repro the issue is public and you want to share us, please share the link.

Additional context Add any other context about the problem here.

graemefoster avatar Aug 08 '24 04:08 graemefoster

Hi @graemefoster, thanks for reporting this. With the help of our extension expert, we reproduced this bug and will triage to the feature owner to fix it. You may follow below steps to workaround this issue:

  1. Create file .azureml/config.json under flow or it's parent folder.
  2. Fill the workspace info into it with below format:
{
    "subscription_id": "..",
    "resource_group": "..",
    "workspace_name": ".."
}
  1. Waiting for the index_type loading, it looks like: image

brynn-code avatar Aug 08 '24 08:08 brynn-code

Thanks @brynn-code - got it: Do I need to have an AI Studio (or ML Studio) for this to work on local VS Code?

I'm using Azure AI Studio and the bit I got wrong first was entering the hub name instead of the project name into the config.

Thanks, Graeme

graemefoster avatar Aug 08 '24 08:08 graemefoster

It depends on the tool itself. Some of the other tool may not need this, but for this vector db tool, it does need an Azure AI Studio resource to list the available index types, and please put the project name as the workspace_name value. (Frankly, the hub name and project name do have a bit of complexity, I also confused at the first time..

brynn-code avatar Aug 08 '24 08:08 brynn-code

@brynn-code so is this just a development time dependency then? Could I run this task against Azure Search in my own container without ML / AI Studio?

Unfortunately I now break at runtime. It looks like the vector-db tool has a dependency on an environment variable called "BUILD_INFO" which exists when run in AI Studio managed containers, but not when I run locally.

These are the lines in a function called promptflow_vectordb/tool/common_index_lookup.py/__get_custom_dimensions which is executed at runtime and fails:

build_info = os.environ.get("BUILD_INFO", "")
runtime_version = json.loads(build_info)['build_number']

Thanks, Graeme

graemefoster avatar Aug 08 '24 12:08 graemefoster

I've been trying to figure out this problem for days.... Since I'm not that familiar with python I could not figure out why I was getting some weird JSONDecodeError error and this post solved it for me.

I ended up adding this to check for the existence of this environment variable in order to get around it for now until the bug is fixed.

if "BUILD_INFO" not in os.environ:
    build_number = os.environ.get("BUILD_BUILDNUMBER", "")
    buid_info = {"build_number": f"ci-{build_number}" if build_number else "local-pytest"}
    os.environ["BUILD_INFO"] =  json.dumps(buid_info)

oolong2 avatar Aug 09 '24 01:08 oolong2

Thanks for the feedback, seems this time the error raised from vectordb tool, we may need our partner team's help.

@dans-msft and @Adarsh-Ramanathan, could you please help take a look at this index lookup BUILD INFO missing issue?

brynn-code avatar Aug 09 '24 02:08 brynn-code

@oolong2 where did you add your code? Was it in common_index_lookup.py?

if "BUILD_INFO" not in os.environ:
    build_number = os.environ.get("BUILD_BUILDNUMBER", "")
    buid_info = {"build_number": f"ci-{build_number}" if build_number else "local-pytest"}
    os.environ["BUILD_INFO"] =  json.dumps(buid_info)

Joan-Healy avatar Aug 16 '24 14:08 Joan-Healy

@oolong2 where did you add your code? Was it in common_index_lookup.py?

Anyplace where you can run some startup code, such as a class constructor. In my case I just added it to beginning of one of my prompt flow code files. As long as it is before the index lookup runs.

oolong2 avatar Aug 16 '24 16:08 oolong2

We have an internal PR for the vscode extension to fix the error argument issue, and the patch will be released in the following days. This PR only covers the pfutil error argument and is not related to vectordb features.

linningmii avatar Aug 21 '24 09:08 linningmii

I was also getting stuck on this missing BUILD_INFO ( adding the full error here for Google folks to hopefully find this solution ):

pf.flow.test failed with UserErrorException: JSONDecodeError: Execution failure in 'get_context': (JSONDecodeError) Expecting value: line 1 column 1 (char 0)

Thanks to the discussion here, I was able to add this to the Dockerfile we were using, as BUILD_INFO was never going to be defined in this environment.

# Set Promptflow Specific Environment Variables
ENV BUILD_INFO='{"build_number": "local-pytest"}'

After rebuilding the Docker container, things worked just fine.

peter-schmalfeldt avatar Sep 06 '24 00:09 peter-schmalfeldt

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

github-actions[bot] avatar Oct 06 '24 21:10 github-actions[bot]