mail
mail copied to clipboard
feat: add context chat provider
Thanks for opening your first pull request in this repository! :v:
It was decided that we should move on with this without https://github.com/nextcloud/server/issues/52852.
I'll look for a way to make Psalm understand the foreign code.
It was decided that we should move on with this without nextcloud/server#52852.
I'll look for a way to make Psalm understand the foreign code.
Since we have to do workarounds to get Psalm working with stubs, have a stub update mechanism, and additionally need the actual classes to be able to write tests, we decided that https://github.com/nextcloud/server/issues/52852 should be pursued again. It's also an investment of time, but we'll have our standard setup for app communication through the OCP public API. That setup works for Psalm and PHPUnit.
Looks really good to me! If we can finish the tests, I'd look forward to a review from the groupware team 💙
Just as a heads-up, I might not have time to finish the tests, especially if they're required to be very thorough and so would take up even more dev time. That's why I marked them as ignored/skipped for now, but anyone else is free to pick up the slack if needed.
Looking forward to reviews from the Groupware team :)
@edward-ly is this ready for review?
@edward-ly is this ready for review?
Not yet, still more unit tests to complete, but feel free to suggest more performance/other improvements if you think they're needed.
It looks like all unit tests have been added for the new methods now, but the codecov checks are still marking them as missing. I'm not sure if the integration tests are not being run or if there is actually something wrong with the checker.
Any failing checks on stable31 can be ignored.
If performance is sufficient for now, we can work on improving this in a future PR.
I don't have a working app_api/assistant/context_chat setup and can only make a guess, but it seems that once an administrator sets up context_chat on a Nextcloud instance, new emails are automatically fed into that model? If that's correct, then we should have a per-account toggle to turn that off.
I don't have a working app_api/assistant/context_chat setup and can only make a guess, but it seems that once an administrator sets up context_chat on a Nextcloud instance, new emails are automatically fed into that model? If that's correct, then we should have a per-account toggle to turn that off.
I don't know if all of the existing apps have a similar toggle yet, but yeah, that's a good idea regardless. Where in the mail settings do you think would be a good place to put it?
@edward-ly could you please add instructions for how to test this? I set up context_chat and assistant. I ran cron many times. The initial import is not triggered.
@edward-ly could you please add instructions for how to test this? I set up context_chat and assistant. I ran cron many times. The initial import is not triggered.
Hm, the docs say that "the triggerInitialImport method is called when Context Chat is first set up", but it's not clear what "set up" means, for sure. Maybe context_chat needs to be installed after the mail app so that it can find and trigger all existing providers?
@marcelklehr do you know the specifics around this?
The import function should be called when the app registers itself with context chat for the first time, so it shouldn't matter which one is installed first.
@ChristophWurst Did you install context_chat_backend as well?
I currently cannot test this either, because (likely since this was rebased on main) the mail app in this branch cannot connect to any IMAP server. :/
@ChristophWurst Did you install context_chat_backend as well?
Bingo! That one is missing
Thanks to you and Alexander I have CCB running. I wasn't able to catch the initial import method with xdebug. Maybe it just hasn't run.
I see this in CB stats:
ContextChat statistics:
The indexing is not complete yet.
Total eligible files: 4
Files in indexing queue: 4
New files in indexing queue (without updates): 4
Queued documents (without files):array (
)
Files successfully sent to backend: 0
Indexed documents: array (
)
Actions in queue: 0
File system events in queue: 0
\OCA\ContextChat\BackgroundJobs\SubmitContentJob::run ran with a few email (most recent ones).
The import function should be called when the app registers itself with context chat for the first time
How can I trigger that?
mmh, this may be a bug in context_chat then :thinking:
This method here should be called whenever an app interacts with context_chat: https://github.com/nextcloud/context_chat/blob/main/lib/Public/ContentManager.php#L95 and triggers an event for apps to register themselves.
Mmh, looking into this it might be a mind-o that we hadn't realized so far: https://github.com/nextcloud/context_chat/blob/main/lib/Event/ContentProviderRegisterEvent.php#L10
context_chat triggers its own event. The thinking was that making this even inherÃt from the event in OCP would make it compatible, but since the class path is different, any app listening to the OCP event will likely never get called.
This may be a fix, haven't tested yet: https://github.com/nextcloud/context_chat/pull/187
Looks like I'm getting closer
Despite seeing a few messages go through SubmitContentJob I can't run CC queries. The error out with A TaskProcessing context_chat:context_chat task with id 496 failed with the following message: No result in ContextChat response: Context Retrieval Error: No documents retrieved, please index a few documents first
Is there a certain minimum number of emails/documents need to make it work?
Edit: it suddenly stopped complaining about missing docs. Running for a couple of minutes now
I think it's still stuck at no docs, even though that error no longer bubbles up
$ php occ context_chat:stats
ContextChat statistics:
The indexing is not complete yet.
Total eligible files: 4
Files in indexing queue: 4
New files in indexing queue (without updates): 4
Queued documents (without files):array (
'mail__mail' => 12,
)
Files successfully sent to backend: 0
Indexed documents: array (
)
Actions in queue: 0
File system events in queue: 0
I'm not sure how indexing is triggered so I execute the background jobs manually. Somehow this gets stuck for a very long time too
]$ php occ background-job:execute 10896
Job class: OCA\ContextChat\BackgroundJobs\IndexerJob
Arguments: {"storageId":1747,"rootId":38717}
Type: timed
Last checked: 2025-11-05T07:55:03+00:00
Reserved at: 2025-11-05T07:55:03+00:00
Last executed: 2025-11-05T07:55:03+00:00
Last duration: 0
Next execution: 2025-11-05T08:25:03+00:00
^ waiting 15 mins again.
Edit: an infinite loop in \OC\Files\Storage\Local::getSourcePath causes the job to never return
For some reason the indexer job gets stuck somewhere in the filesystem code.
@marcelklehr what is the proper way to get the unindexed mail documents indexed?
Edit: must have been the stuck indexer jobs. Fixed them manually. Got to an "indexed" state. Yet it doesn't show any indexed documents
php occ context_chat:stats
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 0
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
)
Files successfully sent to backend: 0
Indexed documents: array (
)
Actions in queue: 0
File system events in queue: 0
… and fails with …
INFO: 127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
llama_perf_context_print: load time = 949.62 ms
llama_perf_context_print: prompt eval time = 637.92 ms / 9 tokens ( 70.88 ms per token, 14.11 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 638.18 ms / 10 tokens
INFO: ::1:54578 - "POST /v1/embeddings HTTP/1.1" 200 OK
2025-11-05T11:49:15+0000: [ERROR|utils]: original traceback: Traceback (most recent call last):
File "/app/context_chat_backend/utils.py", line 74, in exception_wrap
resconn.send({ 'value': fun(*args, **kwargs), 'error': None })
^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/chain/one_shot.py", line 68, in process_context_query
raise ContextException('No documents retrieved, please index a few documents first')
context_chat_backend.chain.types.ContextException: No documents retrieved, please index a few documents first
2025-11-05T11:49:15+0000: [ERROR|controller]: Context Retrieval Error: /query:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 75, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 302, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 215, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 38, in run_in_threadpool
return await anyio.to_thread.run_sync(func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2476, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 967, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/controller.py", line 191, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/controller.py", line 477, in _
return execute_query(query)
^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/controller.py", line 466, in execute_query
return exec_in_proc(target=target, args=args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/utils.py", line 97, in exec_in_proc
raise result['error']
context_chat_backend.chain.types.ContextException: No documents retrieved, please index a few documents first
2025-11-05T11:49:15+0000: [ERROR|utils]: Failed request (400): Context Retrieval Error: No documents retrieved, please index a few documents first
INFO: 127.0.0.1:0 - "POST /query HTTP/1.0" 400 Bad Request
Indexed documents: array ( )
weird, it doesn't show anything after indexing all the "mail__mail" docs. context chat has a lot of logs so it uses a separate logging file. are there any logs in the "data/context_chat.log*" files, maybe some errors about indexing or when the stats are fetched? Same for context_chat_backend, are there any error logs from when the mail docs were indexed? We can also check the db to see if the docs are actually there.
docker exec -it nc_app_context_chat_backend psql -U ccbuser -d ccb -p 5001
select * from docs;
\d
$ docker exec -it nc_app_context_chat_backend psql -U ccbuser -d ccb -p 5001
psql (17.5 (Ubuntu 17.5-1.pgdg22.04+1))
Type "help" for help.
ccb=# select * from docs;
\d
source_id | provider | modified | chunks
-----------------------+----------------+---------------------+----------------------------------------
files__default: 44495 | files__default | 2025-11-05 11:57:03 | {6bb8b530-d2bd-4856-9fda-bc35cd700b6a}
(1 row)
List of relations
Schema | Name | Type | Owner
--------+-------------------------+----------+---------
public | access_list | table | ccbuser
public | access_list_id_seq | sequence | ccbuser
public | docs | table | ccbuser
public | langchain_pg_collection | table | ccbuser
public | langchain_pg_embedding | table | ccbuser
(5 rows)
^ it contains the one file my user has. the submit mail docs are not there?!
According to the CCB the sources where received successfully, however:
INFO: 127.0.0.1:0 - "POST /query HTTP/1.0" 500 Internal Server Error
INFO: 127.0.0.1:0 - "POST /countIndexedDocuments HTTP/1.0" 200 OK
INFO: 127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
INFO: 127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
INFO: 127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
INFO: 127.0.0.1:0 - "POST /countIndexedDocuments HTTP/1.0" 200 OK
INFO: 127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
data/context_chat.log
{"reqId":"S1BEBD5O6ehNg7gTba16","level":3,"time":"2025-11-07T14:11:05+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Background jobs: OCA\\ContextChat\\BackgroundJobs\\SubmitContentJob 10961 ended","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"5hp3UffxkfWi4BkTFXFf","level":3,"time":"2025-11-07T14:11:11+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Background jobs: OCA\\ContextChat\\BackgroundJobs\\SubmitContentJob 10962 started","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"5hp3UffxkfWi4BkTFXFf","level":3,"time":"2025-11-07T14:11:16+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"[SubmitContentJob] Indexed sources for providers","userAgent":"--","version":"33.0.0.1","data":{"count":"0","loaded_sources":"[]","sources_to_retry":"[]"}}
{"reqId":"5hp3UffxkfWi4BkTFXFf","level":3,"time":"2025-11-07T14:11:16+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Background jobs: OCA\\ContextChat\\BackgroundJobs\\SubmitContentJob 10962 ended","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 1
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
'mail__mail' => 12,
)
Files successfully sent to backend: 1
Indexed documents: array (
'files__default' => 1,
)
Actions in queue: 0
File system events in queue: 0
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 1
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
'mail__mail' => 24,
)
Files successfully sent to backend: 1
Indexed documents: array (
'files__default' => 1,
)
Actions in queue: 0
File system events in queue: 0
With @kyteinsky's fix of CCB the Mail "documents" are no longer ignored and do fill the vector db:
occ context_chat:stats
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 1
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
'mail__mail' => 6,
)
Files successfully sent to backend: 1
Indexed documents: array (
'files__default' => 1,
'mail__mail' => 11,
)
Actions in queue: 0
File system events in queue: 0
Sending context chat queries works too. Results don't
{
"reqId": "6Ulnm9L9uSvf4GrSUtG3",
"level": 2,
"time": "2025-11-13T08:29:34+00:00",
"remoteAddr": "",
"user": "--",
"app": "no app in context",
"method": "",
"url": "--",
"message": "A TaskProcessing context_chat:context_chat task with id 539 failed with the following message: Parameter \"id\" for route \"mail.page.thread\" must match \"[^/]++\" (\"\" given) to generate a corresponding URL.",
"userAgent": "--",
"version": "33.0.0.1",
"data": []
}
- Requires https://github.com/nextcloud/context_chat_backend/pull/232 (released, update CCB)
- Requires https://github.com/nextcloud/context_chat/pull/190
@edward-ly here are the major todos:
- Fix the infinite indexing loop.
- Fix the route generation.
- Add an opt-in switch. We don't want the context chat integration to be on by default for now. Admins have to turn this on explicitly.