openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

I can no longer upload files to vector store with AzureOpenAI

Open matteopulega opened this issue 1 year ago • 14 comments

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • [X] This is an issue with the Python library

Describe the bug

Hi,

From 2 days till now i'm getting error when I try to upload files in vector stores using AzureOpenAI package. The same code works with OpenAI.

I changed nothing in my code but from 31/07/2024 it doesn't work with AzureOpenAI.

The output of file_batch:

File batch: FileCounts(cancelled=0, completed=0, failed=1, in_progress=0, total=1)
File batch status: failed

File status: failed
File last error: LastError(code='server_error', message='An internal error occurred.')

Are there some problems with AzureOpenAI ?

Thanks, Matteo

To Reproduce

Use a simple file.txt or other types.

Execute the code and see the result.

Code snippets

from openai import AzureOpenAI
client = AzureOpenAI(
      api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
      api_version="2024-05-01-preview",
      azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
      )
file_stream = open("path/of/my/simple/file.txt", "rb")

vector_store = client.beta.vector_stores.create(name="vs_test_assistant_v2")
vector_store_id = vector_store.id

print("Uploading file to vector store..")
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
          vector_store_id=vector_store_id, 
          files=[file_stream],
          )

print(f"File batch status: {file_batch.status}")
print(f"File batch: {file_batch.file_counts}")
file = client.beta.vector_stores.files.list(vector_store_id).data[0]
print(f"File status: {file.status}")
if file.status == "failed":
    print(f"File last error: {file.last_error}")

OS

Linux

Python version

Python v3.10.12

Library version

openai v1.37.2

matteopulega avatar Aug 02 '24 07:08 matteopulega

cc @kristapratico

RobertCraigie avatar Aug 02 '24 08:08 RobertCraigie

@matteopulega I'm not able to reproduce the error. Can you share which region your Azure OpenAI resource is in?

If this is still failing today, I recommend opening a support ticket against the service.

kristapratico avatar Aug 02 '24 16:08 kristapratico

The region Is swedencentral.

matteopulega avatar Aug 02 '24 17:08 matteopulega

same problem here

AmineDjeghri avatar Sep 23 '24 14:09 AmineDjeghri

okey, after some tests, i can upload files when a vector store is empty. But i can not when there is at least one document in it. The status of the file stays on :'in_progress'

and using the UI (vector stores) of https://oai.azure.com/ also has some problems. I can't add / delete files sometimes

image

AmineDjeghri avatar Sep 23 '24 14:09 AmineDjeghri

Now, however, I have a different problem: after having created a vector store and 2 files for examples, when I try to execute client.beta.vector_stores.file_batches.create_and_poll(vector_store_id=vector_store_id,file_ids=file_ids) the process never ends.

matteopulega avatar Sep 26 '24 09:09 matteopulega

when I try to execute client.beta.vector_stores.file_batches.create_and_poll(vector_store_id=vector_store_id,file_ids=file_ids) the process never ends.

are you on the latest version?

RobertCraigie avatar Sep 26 '24 10:09 RobertCraigie

yes, in 1.48. Just to remember, i'm using AzureOpenAI with region swedencentral.

matteopulega avatar Sep 26 '24 11:09 matteopulega

Now it works. it seems that sometimes vector stores using AzureOpenai stop to work correctly.

matteopulega avatar Sep 26 '24 12:09 matteopulega

Same here - The file upload was working until Oct 9th last time i checked and now when i try to execute the below code, the process never ends.

open ai version = 1.51.2 (latest) previously 1.50.2

    file_paths = ["./testFile.pdf"]
    file_streams = [open(path, "rb") for path in file_paths]

    #upload and poll the file
    file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
        vector_store_id=vector_store.id, files=file_streams
    )
    print (file_batch.status)

The same example which was working previously stopped working now. No changes were done in the code.

NiharT92 avatar Oct 14 '24 09:10 NiharT92

Same here again

matteopulega avatar Oct 14 '24 13:10 matteopulega

UPDATE: So we had created a lot of vector stores, I removed a lot of them and now it's running smoothly again.

Yes same here, sometimes it just works directly but then all of a sudden the call CreateBatchFileJob stays in_progress:

private bool AddFilesToVectorStore(VectorStore vectorStore, List<AIFile> filesToAdd)
 {
     // Now add the files to the vector store
     var vectorJob = _assistantVectorClient.CreateBatchFileJob(vectorStore.Id, filesToAdd.Select(f => f.FileId).ToList(), false);

     // If the run is not successful, we will log it now
     if (vectorJob.Status == VectorStoreBatchFileJobStatus.InProgress)
     {
         _logger.LogWarning("Vector job is still in progress, waiting for completion.");
         WaitForVectorStoreJobCompletion(_assistantVectorClient, vectorJob.Value);
     }
     else if (vectorJob.Status != VectorStoreBatchFileJobStatus.Completed)
     {
         _logger.LogWarning($"Run failed with status: {vectorJob.Status}");

         // We will cancel this job and retry it
         _assistantVectorClient.CancelBatchFileJob(vectorStore.Id, vectorJob.Value.BatchId);
         _logger.LogWarning("Vector job cancelled");

         return false;
     }
     else
     {
         _logger.LogInformation("Vector job completed.");
     }

     var vectorJobStatus = _assistantVectorClient.GetBatchFileJob(vectorStore.Id, vectorJob.Value.BatchId);
     _logger.LogInformation($"Files completed: {vectorJobStatus.Value?.FileCounts.Completed}");

     return true;
 }

dnc-nl avatar Oct 15 '24 09:10 dnc-nl

Same here - The file upload was working until Oct 9th last time i checked and now when i try to execute the below code, the process never ends.

open ai version = 1.51.2 (latest) previously 1.50.2

    file_paths = ["./testFile.pdf"]
    file_streams = [open(path, "rb") for path in file_paths]

    #upload and poll the file
    file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
        vector_store_id=vector_store.id, files=file_streams
    )
    print (file_batch.status)

The same example which was working previously stopped working now. No changes were done in the code.

UPDATE: The upload is working smoothly again. Did not change anything anywhere. Seems this issue is intermittent and persists for a long time before going back to normal.

NiharT92 avatar Oct 16 '24 08:10 NiharT92

Hi. Using AzureOpenAI (client = AzureOpenAI())

I can query my vector store and I can upload files, but the files don't seem to actually associate with the vector store when I load the Azure AI Foundry/ check it in the assistant vector stores section.

I don't get an error, it's just that the files don't seem to be attached. Any idea if this is a bug?

This works:

`

Retrieve files from the vector store.

def get_vector_store_files(vector_store_id, limit=100): file_data = [] after = None

while True:
    try:
        logging.info(f"Attempting to fetch vector store files for {vector_store_id} with after={after}")
        response = client.vector_stores.files.list(vector_store_id, limit=limit, after=after)
    except Exception as e:
        logging.error(f"Error fetching vector store files: {e}")
        break

    for file in response.data:
        try:
            file_detail = client.files.retrieve(file.id)
            upload_time = datetime.fromtimestamp(file_detail.created_at)
            vs_filename = file_detail.filename
            file_key = vs_filename[-16:]  # use last 16 characters for matching
            file_data.append({
                "VectorStoreFileName": vs_filename,
                "FileKey": file_key,
                "VectorStoreUpload": upload_time,
                "VectorStoreFileID": file_detail.id
            })
            logging.info(f"Retrieved vector store file: {vs_filename}")
        except Exception as e:
            logging.error(f"Error retrieving details for file ID {file.id}: {e}")

    if getattr(response, 'has_more', False):
        after = response.data[-1].id
    else:
        break

return file_data`

this works too, but not putting the files on the VS itself.

`

def upload_file_to_vector_store(vector_store_id, file_path): try: with open(file_path, "rb") as f: # Use the file batch helper to upload and attach the file. file_batch = client.vector_stores.file_batches.upload_and_poll( vector_store_id=vector_store_id, files=[f] ) logging.info(f"File batch upload status: {file_batch.status}") logging.info(f"File batch counts: {file_batch.file_counts}") return file_batch except Exception as e: logging.error(f"Error uploading file '{file_path}' to vector store: {e}")

def fix_any_upload_issues(vector_store_id): """ Checks the vector store for files with a status of 'failed' and attempts to reattach them. Retries up to 5 times per file. """ try: files_page = client.vector_stores.files.list(vector_store_id) files = files_page.data except Exception as e: logging.error(f"Error listing files from vector store {vector_store_id}: {e}") return

# Filter files that have a 'failed' status.
failed_files = [f for f in files if getattr(f, "status", None) == "failed"]
logging.info(f"Initial failed files: {[f.id for f in failed_files]}")

for failed_file in failed_files:
    attempt = 0
    success = False

    while attempt < 5 and not success:
        attempt += 1
        logging.info(f"Attempt {attempt} for file {failed_file.id}")
        try:
            # Attempt to reattach the failed file to the vector store.
            # Note: Using the create() method here. Depending on your SDK version,
            # you might need to call client.vector_stores.files.create(...)
            client.vector_stores.files.create(
                vector_store_id, file_id=failed_file.id
            )
            # After the attempt, re-read the file list and check if this file still fails.
            updated_files_page = client.vector_stores.files.list(vector_store_id)
            updated_files = updated_files_page.data
            updated_failed_files = [
                f for f in updated_files if getattr(f, "status", None) == "failed"
            ]
            if not any(f.id == failed_file.id for f in updated_failed_files):
                success = True
                logging.info(f"Successfully reattached file {failed_file.id}")
            else:
                logging.info(f"File {failed_file.id} still in failed status after attempt {attempt}")
        except Exception as error:
            logging.error(
                f"Failed to reattach file {failed_file.id} on attempt {attempt}: {error}"
            )

logging.info("Finished processing failed files.")`

olliejgooding avatar Mar 14 '25 14:03 olliejgooding

It’s April 2025, and I can confirm that this issue persists. I’ve implemented a workaround by detecting failures, removing the file reference from the vector store (not the actual file), and initiating a re-upload. This approach seems to work on the first retry.

I hope Microsoft and/or OpenAI will address this issue soon because this workaround is a hack and shouldn’t be the burden of us end-users.

brandonavant avatar Apr 11 '25 04:04 brandonavant