DocsGPT
DocsGPT copied to clipboard
Training in progress...
I git clone the last version and try to upload a pdf file, but system hold on training in progress... old version (4 days ago)doesn't have this problem when i upload same pdf file. env file like API_KEY="my openAI api key" EMBEDDINGS_KEY="my openAI api key" API_URL=localhost:7091 FLASK_APP=application/app.py FLASK_DEBUG=true
#For OPENAI on Azure
#OPENAI_API_BASE=
#OPENAI_API_VERSION=
#AZURE_DEPLOYMENT_NAME=
#AZURE_EMBEDDINGS_DEPLOYMENT_NAME=
Hmm cant replicate, can you give more details please, are you launching using docker-compose?
Does it answer questions on default dataset or there are only issues when you try uploading? are you doing it on your local device?
@dartpain Got same issue, here is the logs of container:
[2023-10-07 09:00:43,124: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:20 exited with 'signal 9 (SIGKILL)'
docsgpt-worker-1 | [2023-10-07 09:00:43,145: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 0.')
docsgpt-worker-1 | Traceback (most recent call last):
docsgpt-worker-1 | File "/usr/local/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
docsgpt-worker-1 | raise WorkerLostError(
docsgpt-worker-1 | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 0.
docsgpt-backend-1 | [2023-10-07 09:00:43 +0000] [7] [ERROR] Error handling request /api/task_status?task_id=97ea9648-a242-417c-b5e2-dd032986e3cd
docsgpt-backend-1 | Traceback (most recent call last):
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 136, in handle
docsgpt-backend-1 | self.handle_request(listener, req, client, addr)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
docsgpt-backend-1 | respiter = self.wsgi(environ, resp.start_response)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2552, in __call__
docsgpt-backend-1 | return self.wsgi_app(environ, start_response)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2532, in wsgi_app
docsgpt-backend-1 | response = self.handle_exception(e)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2529, in wsgi_app
docsgpt-backend-1 | response = self.full_dispatch_request()
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1826, in full_dispatch_request
docsgpt-backend-1 | return self.finalize_request(rv)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1845, in finalize_request
docsgpt-backend-1 | response = self.make_response(rv)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2157, in make_response
docsgpt-backend-1 | rv = self.json.response(rv)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 309, in response
docsgpt-backend-1 | f"{self.dumps(obj, **dump_args)}\n", mimetype=mimetype
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 230, in dumps
docsgpt-backend-1 | return json.dumps(obj, **kwargs)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/__init__.py", line 238, in dumps
docsgpt-backend-1 | **kw).encode(obj)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 201, in encode
docsgpt-backend-1 | chunks = list(chunks)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 431, in _iterencode
docsgpt-backend-1 | yield from _iterencode_dict(o, _current_indent_level)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
docsgpt-backend-1 | yield from chunks
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 438, in _iterencode
docsgpt-backend-1 | o = _default(o)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 122, in _default
docsgpt-backend-1 | raise TypeError(f"Object of type {type(o).__name__} is not JSON serializable")
docsgpt-backend-1 | TypeError: Object of type WorkerLostError is not JSON serializable
docsgpt-backend-1 | [2023-10-07 09:00:43 +0000] [7] [ERROR] Error handling request /api/task_status?task_id=97ea9648-a242-417c-b5e2-dd032986e3cd
docsgpt-backend-1 | Traceback (most recent call last):
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 136, in handle
docsgpt-backend-1 | self.handle_request(listener, req, client, addr)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
docsgpt-backend-1 | respiter = self.wsgi(environ, resp.start_response)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2552, in __call__
docsgpt-backend-1 | return self.wsgi_app(environ, start_response)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2532, in wsgi_app
docsgpt-backend-1 | response = self.handle_exception(e)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2529, in wsgi_app
docsgpt-backend-1 | response = self.full_dispatch_request()
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1826, in full_dispatch_request
docsgpt-backend-1 | return self.finalize_request(rv)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1845, in finalize_request
docsgpt-backend-1 | response = self.make_response(rv)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2157, in make_response
docsgpt-backend-1 | rv = self.json.response(rv)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 309, in response
docsgpt-backend-1 | f"{self.dumps(obj, **dump_args)}\n", mimetype=mimetype
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 230, in dumps
docsgpt-backend-1 | return json.dumps(obj, **kwargs)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/__init__.py", line 238, in dumps
docsgpt-backend-1 | **kw).encode(obj)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 201, in encode
docsgpt-backend-1 | chunks = list(chunks)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 431, in _iterencode
docsgpt-backend-1 | yield from _iterencode_dict(o, _current_indent_level)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
docsgpt-backend-1 | yield from chunks
docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 438, in _iterencode
docsgpt-backend-1 | o = _default(o)
docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 122, in _default
docsgpt-backend-1 | raise TypeError(f"Object of type {type(o).__name__} is not JSON serializable")
docsgpt-backend-1 | TypeError: Object of type WorkerLostError is not JSON serializable
Need to investigate why worker exited.
Could be: Memory Issues: Check if the system is running out of memory. Monitor memory usage when running the application to confirm this. Manually Killed: Ensure that the process is not being manually killed by some other processes or scripts.
Are there any more logs for docsgpt-worker-1 container?
hi @dartpain , thanks for your quick response. Here is the full worker log:
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/usr/local/lib/python3.10/site-packages/celery/platforms.py:840: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(
-------------- celery@8436a71b938a v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-5.15.82-0-virt-x86_64-with-glibc2.31 2023-10-07 09:00:08
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: application.celery:0x7f088ba6b1c0
- ** ---------- .> transport: redis://redis:6379/0
- ** ---------- .> results: redis://redis:6379/1
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. application.api.user.tasks.ingest
[2023-10-07 09:00:09,156: INFO/MainProcess] Connected to redis://redis:6379/0
[2023-10-07 09:00:09,175: INFO/MainProcess] mingle: searching for neighbors
[2023-10-07 09:00:10,198: INFO/MainProcess] mingle: all alone
[2023-10-07 09:00:10,311: INFO/MainProcess] celery@8436a71b938a ready.
[2023-10-07 09:00:37,233: INFO/MainProcess] Task application.api.user.tasks.ingest[97ea9648-a242-417c-b5e2-dd032986e3cd] received
[2023-10-07 09:00:37,237: WARNING/ForkPoolWorker-2] inputs/local/E_EG_441_0081.pdf
[2023-10-07 09:00:37,258: WARNING/ForkPoolWorker-2] <Response [200]>
[2023-10-07 09:00:38,421: WARNING/ForkPoolWorker-2] Grouping small documents
[2023-10-07 09:00:43,124: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:20 exited with 'signal 9 (SIGKILL)'
[2023-10-07 09:00:43,145: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 0.')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 0.
worker: Warm shutdown (MainProcess)
Yesterday, it happens in demo website https://docsgpt.arc53.com/ and my local env. But I can not reproduce this issue today, it's working after restart all containers.
[2023-10-08 01:32:03,828: INFO/MainProcess] Connected to redis://redis:6379/0
[2023-10-08 01:32:03,834: INFO/MainProcess] mingle: searching for neighbors
[2023-10-08 01:32:04,850: INFO/MainProcess] mingle: all alone
[2023-10-08 01:32:04,870: INFO/MainProcess] celery@8436a71b938a ready.
[2023-10-08 01:32:41,744: INFO/MainProcess] Task application.api.user.tasks.ingest[d137c53b-ea4b-4579-8142-79ce83a797bc] received
[2023-10-08 01:32:41,747: WARNING/ForkPoolWorker-2] inputs/local/E_EG_441_0081.pdf
[2023-10-08 01:32:41,773: WARNING/ForkPoolWorker-2] <Response [200]>
[2023-10-08 01:32:42,915: WARNING/ForkPoolWorker-2] Grouping small documents
[2023-10-08 01:32:43,282: WARNING/ForkPoolWorker-2] Separating large documents
[2023-10-08 01:32:44,067: INFO/ForkPoolWorker-2] Loading faiss with AVX2 support.
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[2023-10-08 01:32:44,119: INFO/ForkPoolWorker-2] Successfully loaded faiss with AVX2 support.
Embedding π¦: 0%| | Time Left: ?orker-2]
Embedding π¦: 20%|## | Time Left: 00:04r-2]
Embedding π¦: 40%|#### | Time Left: 00:02r-2]
Embedding π¦: 60%|###### | Time Left: 00:01r-2]
Embedding π¦: 80%|######## | Time Left: 00:00r-2]
Embedding π¦: 100%|##########| Time Left: 00:00r-2]
Embedding π¦: 100%|##########| Time Left: 00:00r-2]
[2023-10-08 01:32:47,854: INFO/ForkPoolWorker-2] Task application.api.user.tasks.ingest[d137c53b-ea4b-4579-8142-79ce83a797bc] succeeded in 6.10717300000033s: {'directory': 'inputs', 'formats': ['.rst', '.md', '.pdf', '.txt'], 'name_job': 'E_EG_441_0081.pdf', 'filename': 'E_EG_441_0081.pdf', 'user': 'local', 'limited': False}
@dartpain Alex, I think i found the reason: if the uploaded file name include Chinese characters, the log file show following errors. Same file, if i change file name to english, training smoothly done. May be some encoder need change to support utf-8 code.
023-10-08 22:48:51 docsgpt-backend-1 | f"{self.dumps(obj, **dump_args)}\n", mimetype=mimetype 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 230, in dumps 2023-10-08 22:48:51 docsgpt-backend-1 | return json.dumps(obj, **kwargs) 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/init.py", line 238, in dumps 2023-10-08 22:48:51 docsgpt-backend-1 | **kw).encode(obj) 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 201, in encode 2023-10-08 22:48:51 docsgpt-backend-1 | chunks = list(chunks) 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 431, in _iterencode 2023-10-08 22:48:51 docsgpt-backend-1 | yield from _iterencode_dict(o, _current_indent_level) 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict 2023-10-08 22:48:51 docsgpt-backend-1 | yield from chunks 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/json/encoder.py", line 438, in _iterencode 2023-10-08 22:48:51 docsgpt-backend-1 | o = _default(o) 2023-10-08 22:48:51 docsgpt-backend-1 | File "/usr/local/lib/python3.10/site-packages/flask/json/provider.py", line 122, in _default 2023-10-08 22:48:51 docsgpt-backend-1 | raise TypeError(f"Object of type {type(o).name} is not JSON serializable") 2023-10-08 22:48:51 docsgpt-backend-1 | TypeError: Object of type IndexError is not JSON serializable
Same error at official instance. File name is fully latin (axis.pdf)
Ok thank you for reporting it, def some encoding issue. Can you please send me a link to the document here or via email [email protected]
I encountered the same issue using https://docsgpt.arc53.com/ pdf name is fully Latin. Tried 2 pdfs. Both did not work
@LoveYourEnemy Please send me the pdfs, I would appreciate it a lot!
Ok thank you for reporting it, def some encoding issue. Can you please send me a link to the document here or via email [email protected]
files sent, please kindly check and help.
Yep, I got them, thank you. I will try to fix a bit later today. Once there is a fix ill update you
Thank you!
Dear Aelx Any update on this issue?
Best Regards James
On Mon, 9 Oct 2023 at 23:35, Alex @.***> wrote:
Yep, I got them, thank you. I will try to fix a bit later today. Once there is a fix ill update you
Thank you!
β Reply to this email directly, view it on GitHub https://github.com/arc53/DocsGPT/issues/436#issuecomment-1753239018, or unsubscribe https://github.com/notifications/unsubscribe-auth/A76VNQSQBP7CKJEKP6P46JTX6QKVPAVCNFSM6AAAAAA5UNK5COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJTGIZTSMBRHA . You are receiving this because you authored the thread.Message ID: @.***>
-- P Help save paper - do you need to print this email?
@pabik tried replicating the bug but unfortunately with no success. Can you try again please?
@jamsnrihk I tried using a different browser in which I disabled all Adblocks and allowed all scripts. The website has successfully trained the pdf I uploaded and also provided a summary.
If file name in English, it is no problem to upload and trained, but, if, you rename the file name to Chinese or other double-byte language, it will NOT uploaded and trained.
On Sun, 15 Oct 2023 at 19:09, TZ_Toaster @.***> wrote:
@jamsnrihk https://github.com/jamsnrihk I tried using a different browser in which I disabled all Adblocks and allowed all scripts. The website has successfully trained the pdf I uploaded and also provided summery.
β Reply to this email directly, view it on GitHub https://github.com/arc53/DocsGPT/issues/436#issuecomment-1763355988, or unsubscribe https://github.com/notifications/unsubscribe-auth/A76VNQWFSNHKRUHF4UBRRIDX7PAABAVCNFSM6AAAAAA5UNK5COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRTGM2TKOJYHA . You are receiving this because you were mentioned.Message ID: @.***>
-- P Help save paper - do you need to print this email?
@jamsnrihk Thank you for thenote, will try to replicate now
I visited the website docsgpt.arc53.com/ provided by the project, but when the training file appeared stuck in 100% of the interface did not respond to this reason, which big guy to answer
I was able to train fully Latin named pdf 3 or 4 days ago, but all the sudden they get stuck at 0% (using website) When I refresh the web page when it gets stuck at 0%, I get error message: 404: NOT_FOUND Code: NOT_FOUND ID: fra1::prw7c-1702454727390-1fd3bbf5fdf1
https://github.com/arc53/DocsGPT/issues/490#issuecomment-1751882587 I think this might also be the problem on the website
Should be resolved now, pelase try again. Therwas a bit of an overflow
I encountered the same issue in Windows dev environment, but there are no error messages in any of my logs.
Can you please provide me the file that you are trying to upload? Does it work on the demo? https://docsgpt.arc53.com/
Yes, it works well on this demo, and there are no error messages in the logs. testcase.pdf
Demo ingest files the same way the open source version does. Wierd that its not working for you. Does the default file work?
yes οΌchat is ok
Is this summary relevant?
I used latest version.
please walk me through your deployment and any logs that you see in the backend and worker instance.
Do you have a new source doc once training is finished?
Should be all fixed in newer versions