markitdown
markitdown copied to clipboard
Document Intelligence is not working
When I use this:
from markitdown import MarkItDown
md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
result = md.convert("test.pdf")
print(result.text_content)
I get an error saying: No parameter named "docintel_endpoint"
I have version: markitdown==0.0.1a3
I got the same error ~
I find out that the markidown version 0.0.1a3 the _markitdown.py file did'nt has the content of docintel_endpoint ... as below:
Yeah maybe the readme is out of sync
I am also a bit confused how this can work without an API Key parameter. Or does it require Entra ID?
Thanks for the report. Let me investigate. It looks like something didn't quite make it in.
In v 0.0.1a4 there's now another issue when running it:
.venv\Lib\site-packages\markitdown\_markitdown.py", line 1727, in _convert
if res is not None:
^^^
UnboundLocalError: cannot access local variable 'res' where it is not associated with a value
The same is valid also building from source 0.0.2a1
In v 0.0.1a4 there's now another issue when running it:
.venv\Lib\site-packages\markitdown\_markitdown.py", line 1727, in _convert if res is not None: ^^^ UnboundLocalError: cannot access local variable 'res' where it is not associated with a valueThe same is valid also building from source 0.0.2a1
Ok thanks for the report, and sorry for the inconvenience. I'm trying to provision a doc intelligence endpoint to test on, and integrate into the CI, so that we can avoid these breaks in the future. Prior to this, I could only rely on others to test and report findings
this doesnt make sense, this endpoint needs to take in an api key of sort or at least tell us if the api key is ingested from .env variables.
this doesnt make sense, this endpoint needs to take in an api key of sort or at least tell us if the api key is ingested from .env variables.
Looking at the code, it currently only uses and supports the Azure Identify auth (which could be the cause of some of the above issues if an auth error isn't handled properly) It would be nice and not too complex to add the Key Auth too, if I find the time I can add it via PR
That be great if that could be sorted out.
to add the Key Auth too
I was thinking either enable support for API key auth as you suggested or simply accept an initialized client directly.
In v 0.0.1a4 there's now another issue when running it:
.venv\Lib\site-packages\markitdown\_markitdown.py", line 1727, in _convert if res is not None: ^^^ UnboundLocalError: cannot access local variable 'res' where it is not associated with a valueThe same is valid also building from source 0.0.2a1
Ok thanks for the report, and sorry for the inconvenience. I'm trying to provision a doc intelligence endpoint to test on, and integrate into the CI, so that we can avoid these breaks in the future. Prior to this, I could only rely on others to test and report findings
this error is due to the exception handling fg
try:
res = converter.convert(local_path, **_kwargs)
except Exception:
failed_attempts.append(
FailedConversionAttempt(
converter=converter, exc_info=sys.exc_info()
)
)
if res is not None:
# Normalize the content
res.text_content = "\n".join(
[line.rstrip() for line in re.split(r"\r?\n", res.text_content)]
)
res.text_content = re.sub(r"\n{3,}", "\n\n", res.text_content)
# Todo
return res
should be a try.. except.. else block or something similar. If you hit an exception then the variable res does not get created and so you get this error. (when the point of the exception handling was to log an error but not throw)