openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

Not support doc format

Open Panweitong opened this issue 1 year ago • 4 comments

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • [ ] This is an issue with the Python library

Describe the bug

OpenAI Docs: 1729676939016

Python library: 1729677950967

To Reproduce

Use Files API to Upload doc file,and use "assistants" for Assistants

Code snippets

import openai,io
import requests

openai.api_key = "xxxxxxxxxxxxxxxxxxx"

class FileLike(io.BytesIO):
  def __init__(self, _bytes, filename=None):
    super().__init__(_bytes)
    self.name = filename

url = "https://ccrb.s3.cn-northwest-1.amazonaws.com.cn/%E7%BB%B3%E8%88%9E%E9%A3%9E%E6%89%AC%E6%B4%BB%E5%8A%9B%E7%BB%BD%E6%94%BE.doc"

r = requests.get(url)
fileName = url.split("https://ccrb.s3.cn-northwest-1.amazonaws.com.cn/")[1]

bytes_io = io.BytesIO(r.content)
file_bytes = bytes_io.read()

res = openai.files.create(
  file=FileLike(file_bytes, fileName), purpose="assistants"
)
if res.id and res.status == "processed":
  file = openai.files.retrieve(res.id)
  print(file)

OS

Ubuntu

Python version

Python v3.10.12

Library version

openai v1.51.0

Panweitong avatar Oct 23 '24 10:10 Panweitong

Thanks for the report, can you share an example snippet to reproduce the issue?

RobertCraigie avatar Oct 23 '24 10:10 RobertCraigie

Thanks for the report, can you share an example snippet to reproduce the issue?

OK,I will share an example snippet later.

Panweitong avatar Oct 23 '24 10:10 Panweitong

Thanks for the report, can you share an example snippet to reproduce the issue?

import openai,io
import requests

openai.api_key = "xxxxxxxxxxxxxxxxxxx"

class FileLike(io.BytesIO):
  def __init__(self, _bytes, filename=None):
    super().__init__(_bytes)
    self.name = filename

url = "https://ccrb.s3.cn-northwest-1.amazonaws.com.cn/%E7%BB%B3%E8%88%9E%E9%A3%9E%E6%89%AC%E6%B4%BB%E5%8A%9B%E7%BB%BD%E6%94%BE.doc"

r = requests.get(url)
fileName = url.split("https://ccrb.s3.cn-northwest-1.amazonaws.com.cn/")[1]

bytes_io = io.BytesIO(r.content)
file_bytes = bytes_io.read()

res = openai.files.create(
  file=FileLike(file_bytes, fileName), purpose="assistants"
)
if res.id and res.status == "processed":
  file = openai.files.retrieve(res.id)
  print(file)

Panweitong avatar Oct 23 '24 10:10 Panweitong

Thanks for the report, can you share an example snippet to reproduce the issue?

1729683274389

ef7cc2e7cbbaa0cf641c592f764d672

I tried to test it with the API, but it still reported an error, but the documentation says it supports doc format

Panweitong avatar Oct 23 '24 11:10 Panweitong

@Panweitong OpenAI accepts files of various formats (e.g., .txt, .csv, .json, .pdf, .docx, .doc, etc.), but it's always good to ensure that the version of the API you're using indeed supports doc files.

If you're attempting to use doc files, ensure the content is properly extracted, as doc is a binary format, and OpenAI may have trouble directly interpreting the binary content.

ghost avatar Dec 09 '24 08:12 ghost

Really sorry for the delayed response. As this is an issue with the underlying OpenAI API and not the SDK, I'm going to go ahead and close this issue.

Would you mind reposting at community.openai.com?

RobertCraigie avatar Feb 17 '25 11:02 RobertCraigie