olmocr
olmocr copied to clipboard
SGlang does not meet expectations.(sqlite3.OperationalError: no such column: "size" - should this be a string literal in single-quotes?) A10
🐛 Describe the bug
(olmocr) ubuntu@xxx-xxx-xxx-xxx:~$ python -m olmocr.pipeline ./localworkspace --pdfs paper.pdf
INFO:olmocr.check:pdftoppm is installed and working.
2025-02-27 10:17:40,491 - __main__ - INFO - Got --pdfs argument, going to add to the work queue
2025-02-27 10:17:40,492 - __main__ - INFO - Loading file at paper.pdf as PDF document
2025-02-27 10:17:40,492 - __main__ - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|███████████████████████| 1/1 [00:00<00:00, 233.77it/s]
2025-02-27 10:17:40,497 - __main__ - INFO - Calculated items_per_group: 62 based on average pages per PDF: 8.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-02-27 10:17:40,624 - __main__ - INFO - Starting pipeline with PID 7799
INFO:olmocr.work_queue:Initialized local queue with 1 work items
2025-02-27 10:17:40,748 - __main__ - WARNING - Attempt 1: All connection attempts failed
2025-02-27 10:17:41,759 - __main__ - WARNING - Attempt 2: All connection attempts failed
2025-02-27 10:17:42,767 - __main__ - WARNING - Attempt 3: All connection attempts failed
2025-02-27 10:17:43,776 - __main__ - WARNING - Attempt 4: All connection attempts failed
2025-02-27 10:17:44,784 - __main__ - WARNING - Attempt 5: All connection attempts failed
2025-02-27 10:17:45,114 - __main__ - INFO - Traceback (most recent call last):
2025-02-27 10:17:45,114 - __main__ - INFO - File "<frozen runpy>", line 198, in _run_module_as_main
2025-02-27 10:17:45,114 - __main__ - INFO - File "<frozen runpy>", line 88, in _run_code
2025-02-27 10:17:45,114 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/sglang/launch_server.py", line 6, in <module>
2025-02-27 10:17:45,114 - __main__ - INFO - from sglang.srt.entrypoints.http_server import launch_server
2025-02-27 10:17:45,114 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/sglang/srt/entrypoints/http_server.py", line 41, in <module>
2025-02-27 10:17:45,115 - __main__ - INFO - from sglang.srt.entrypoints.engine import _launch_subprocesses
2025-02-27 10:17:45,115 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/sglang/srt/entrypoints/engine.py", line 52, in <module>
2025-02-27 10:17:45,115 - __main__ - INFO - from sglang.srt.openai_api.adapter import load_chat_template_for_openai_api
2025-02-27 10:17:45,115 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/sglang/srt/openai_api/adapter.py", line 30, in <module>
2025-02-27 10:17:45,115 - __main__ - INFO - from outlines.fsm.json_schema import convert_json_schema_to_str
2025-02-27 10:17:45,115 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/outlines/__init__.py", line 2, in <module>
2025-02-27 10:17:45,115 - __main__ - INFO - import outlines.generate
2025-02-27 10:17:45,115 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/outlines/generate/__init__.py", line 2, in <module>
2025-02-27 10:17:45,115 - __main__ - INFO - from .cfg import cfg
2025-02-27 10:17:45,115 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/outlines/generate/cfg.py", line 3, in <module>
2025-02-27 10:17:45,115 - __main__ - INFO - from outlines.fsm.guide import CFGGuide
2025-02-27 10:17:45,115 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/outlines/fsm/guide.py", line 108, in <module>
2025-02-27 10:17:45,116 - __main__ - INFO - @cache()
2025-02-27 10:17:45,116 - __main__ - INFO - ^^^^^^^
2025-02-27 10:17:45,116 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/outlines/caching.py", line 93, in decorator
2025-02-27 10:17:45,116 - __main__ - INFO - memory = get_cache()
2025-02-27 10:17:45,116 - __main__ - INFO - ^^^^^^^^^^^
2025-02-27 10:17:45,116 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/outlines/caching.py", line 65, in get_cache
2025-02-27 10:17:45,116 - __main__ - INFO - memory["__version__"] = outlines_version
2025-02-27 10:17:45,116 - __main__ - INFO - ~~~~~~^^^^^^^^^^^^^^^
2025-02-27 10:17:45,117 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/diskcache/core.py", line 823, in __setitem__
2025-02-27 10:17:45,117 - __main__ - INFO - self.set(key, value, retry=True)
2025-02-27 10:17:45,117 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/diskcache/core.py", line 808, in set
2025-02-27 10:17:45,117 - __main__ - INFO - self._row_insert(db_key, raw, now, columns)
2025-02-27 10:17:45,117 - __main__ - INFO - File "/home/ubuntu/miniforge3/envs/olmocr/lib/python3.11/site-packages/diskcache/core.py", line 857, in _row_insert
2025-02-27 10:17:45,117 - __main__ - INFO - sql(
2025-02-27 10:17:45,117 - __main__ - INFO - sqlite3.OperationalError: no such column: "size" - should this be a string literal in single-quotes?
Versions
(olmocr) ubuntu@xxx-xxx-xxx-xxx:~$ python --version && pip freeze
Python 3.11.11
aiohappyeyeballs==2.4.6
aiohttp==3.11.13
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20250224
annotated-types==0.7.0
anthropic==0.47.2
anyio==4.8.0
astor==0.8.1
asttokens==3.0.0
attrs==25.1.0
beaker-py==1.34.1
blake3==1.0.4
bleach==6.2.0
boto3==1.37.2
botocore==1.37.2
cached_path==1.6.7
cachetools==5.5.2
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.8.0
cryptography==44.0.1
cuda-bindings==12.8.0
cuda-python==12.8.0
datasets==3.3.2
decorator==5.2.1
decord==0.6.0
depyf==0.18.0
dill==0.3.8
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
docker==7.1.0
einops==0.8.1
executing==2.2.0
fastapi==0.115.8
filelock==3.17.0
flashinfer==0.1.6+cu124torch2.4
flashinfer-python==0.2.2.post1
frozenlist==1.5.0
fsspec==2024.12.0
ftfy==6.3.1
gguf==0.10.0
google-api-core==2.24.1
google-auth==2.38.0
google-cloud-core==2.4.2
google-cloud-storage==2.19.0
google-crc32c==1.6.0
google-resumable-media==2.7.2
googleapis-common-protos==1.68.0
grpcio==1.70.0
h11==0.14.0
hf_transfer==0.1.9
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.6.1
iniconfig==2.0.0
interegular==0.3.3
ipython==8.32.0
jedi==0.19.2
Jinja2==3.1.5
jiter==0.8.2
jmespath==1.0.1
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
lingua-language-detector==2.0.2
litellm==1.61.19
llvmlite==0.44.0
lm-format-enforcer==0.10.11
markdown-it-py==3.0.0
markdown2==2.5.3
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mistral_common==1.5.3
modelscope==1.23.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
ninja==1.11.1.3
numba==0.61.0
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-ml-py==12.570.86
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
olmocr==0.1.58
openai==1.64.0
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.11.0.86
orjson==3.10.15
outlines==0.0.46
outlines_core==0.1.26
packaging==24.2
pandas==2.2.3
parso==0.8.4
partial-json-parser==0.2.1.1.post5
pexpect==4.9.0
pillow==11.1.0
platformdirs==4.3.6
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.50
propcache==0.3.0
proto-plus==1.26.0
protobuf==5.29.3
psutil==7.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
py-cpuinfo==9.0.0
py-spy==0.4.0
pyairports==2.1.1
pyarrow==19.0.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.6
pydantic_core==2.27.2
Pygments==2.19.1
pypdf==5.3.0
pypdfium2==4.30.1
pytest==8.3.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.20
pytz==2025.1
PyYAML==6.0.2
pyzmq==26.2.1
ray==2.42.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.23.1
rsa==4.9
s3transfer==0.11.3
safetensors==0.5.3
sentencepiece==0.2.0
setproctitle==1.3.5
sgl-kernel==0.0.3.post1
sglang==0.4.2
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
stack-data==0.6.3
starlette==0.45.3
sympy==1.13.1
tiktoken==0.9.0
tokenizers==0.21.0
torch==2.5.1
torchao==0.8.0
torchaudio==2.5.1
torchvision==0.20.1
tqdm==4.67.1
traitlets==5.14.3
transformers==4.49.0
triton==3.1.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.2
vllm==0.6.4.post1
watchfiles==1.0.4
wcwidth==0.2.13
webencodings==0.5.1
websockets==15.0
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.10
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0
zstandard==0.23.0