mail icon indicating copy to clipboard operation
mail copied to clipboard

feat: add context chat provider

Open edward-ly opened this issue 6 months ago • 2 comments

edward-ly avatar May 15 '25 05:05 edward-ly

Thanks for opening your first pull request in this repository! :v:

welcome[bot] avatar May 15 '25 05:05 welcome[bot]

It was decided that we should move on with this without https://github.com/nextcloud/server/issues/52852.

I'll look for a way to make Psalm understand the foreign code.

ChristophWurst avatar Jun 03 '25 10:06 ChristophWurst

It was decided that we should move on with this without nextcloud/server#52852.

I'll look for a way to make Psalm understand the foreign code.

Since we have to do workarounds to get Psalm working with stubs, have a stub update mechanism, and additionally need the actual classes to be able to write tests, we decided that https://github.com/nextcloud/server/issues/52852 should be pursued again. It's also an investment of time, but we'll have our standard setup for app communication through the OCP public API. That setup works for Psalm and PHPUnit.

ChristophWurst avatar Jun 26 '25 10:06 ChristophWurst

Looks really good to me! If we can finish the tests, I'd look forward to a review from the groupware team 💙

Just as a heads-up, I might not have time to finish the tests, especially if they're required to be very thorough and so would take up even more dev time. That's why I marked them as ignored/skipped for now, but anyone else is free to pick up the slack if needed.

edward-ly avatar Jul 24 '25 16:07 edward-ly

Looking forward to reviews from the Groupware team :)

marcelklehr avatar Jul 26 '25 04:07 marcelklehr

@edward-ly is this ready for review?

ChristophWurst avatar Sep 15 '25 12:09 ChristophWurst

@edward-ly is this ready for review?

Not yet, still more unit tests to complete, but feel free to suggest more performance/other improvements if you think they're needed.

edward-ly avatar Sep 15 '25 14:09 edward-ly

It looks like all unit tests have been added for the new methods now, but the codecov checks are still marking them as missing. I'm not sure if the integration tests are not being run or if there is actually something wrong with the checker.

Any failing checks on stable31 can be ignored.

If performance is sufficient for now, we can work on improving this in a future PR.

edward-ly avatar Oct 06 '25 09:10 edward-ly

I don't have a working app_api/assistant/context_chat setup and can only make a guess, but it seems that once an administrator sets up context_chat on a Nextcloud instance, new emails are automatically fed into that model? If that's correct, then we should have a per-account toggle to turn that off.

kesselb avatar Oct 09 '25 12:10 kesselb

I don't have a working app_api/assistant/context_chat setup and can only make a guess, but it seems that once an administrator sets up context_chat on a Nextcloud instance, new emails are automatically fed into that model? If that's correct, then we should have a per-account toggle to turn that off.

I don't know if all of the existing apps have a similar toggle yet, but yeah, that's a good idea regardless. Where in the mail settings do you think would be a good place to put it?

edward-ly avatar Oct 09 '25 15:10 edward-ly

@edward-ly could you please add instructions for how to test this? I set up context_chat and assistant. I ran cron many times. The initial import is not triggered.

ChristophWurst avatar Oct 17 '25 12:10 ChristophWurst

@edward-ly could you please add instructions for how to test this? I set up context_chat and assistant. I ran cron many times. The initial import is not triggered.

Hm, the docs say that "the triggerInitialImport method is called when Context Chat is first set up", but it's not clear what "set up" means, for sure. Maybe context_chat needs to be installed after the mail app so that it can find and trigger all existing providers?

@marcelklehr do you know the specifics around this?

edward-ly avatar Oct 17 '25 17:10 edward-ly

The import function should be called when the app registers itself with context chat for the first time, so it shouldn't matter which one is installed first.

@ChristophWurst Did you install context_chat_backend as well?

marcelklehr avatar Oct 20 '25 08:10 marcelklehr

I currently cannot test this either, because (likely since this was rebased on main) the mail app in this branch cannot connect to any IMAP server. :/

marcelklehr avatar Oct 20 '25 08:10 marcelklehr

@ChristophWurst Did you install context_chat_backend as well?

Bingo! That one is missing

ChristophWurst avatar Oct 20 '25 08:10 ChristophWurst

Thanks to you and Alexander I have CCB running. I wasn't able to catch the initial import method with xdebug. Maybe it just hasn't run.

I see this in CB stats:

ContextChat statistics:
The indexing is not complete yet.
Total eligible files: 4
Files in indexing queue: 4
New files in indexing queue (without updates): 4
Queued documents (without files):array (
)
Files successfully sent to backend: 0
Indexed documents: array (
)
Actions in queue: 0
File system events in queue: 0

\OCA\ContextChat\BackgroundJobs\SubmitContentJob::run ran with a few email (most recent ones).

The import function should be called when the app registers itself with context chat for the first time

How can I trigger that?

ChristophWurst avatar Oct 22 '25 11:10 ChristophWurst

mmh, this may be a bug in context_chat then :thinking:

This method here should be called whenever an app interacts with context_chat: https://github.com/nextcloud/context_chat/blob/main/lib/Public/ContentManager.php#L95 and triggers an event for apps to register themselves.

marcelklehr avatar Oct 22 '25 13:10 marcelklehr

Mmh, looking into this it might be a mind-o that we hadn't realized so far: https://github.com/nextcloud/context_chat/blob/main/lib/Event/ContentProviderRegisterEvent.php#L10

context_chat triggers its own event. The thinking was that making this even inherít from the event in OCP would make it compatible, but since the class path is different, any app listening to the OCP event will likely never get called.

marcelklehr avatar Oct 22 '25 13:10 marcelklehr

This may be a fix, haven't tested yet: https://github.com/nextcloud/context_chat/pull/187

marcelklehr avatar Oct 22 '25 13:10 marcelklehr

Looks like I'm getting closer

Bildschirmfoto vom 2025-11-05 08-58-05

ChristophWurst avatar Nov 05 '25 08:11 ChristophWurst

Despite seeing a few messages go through SubmitContentJob I can't run CC queries. The error out with A TaskProcessing context_chat:context_chat task with id 496 failed with the following message: No result in ContextChat response: Context Retrieval Error: No documents retrieved, please index a few documents first

Is there a certain minimum number of emails/documents need to make it work?

Edit: it suddenly stopped complaining about missing docs. Running for a couple of minutes now

image

ChristophWurst avatar Nov 05 '25 08:11 ChristophWurst

I think it's still stuck at no docs, even though that error no longer bubbles up

$ php occ context_chat:stats
ContextChat statistics:
The indexing is not complete yet.
Total eligible files: 4
Files in indexing queue: 4
New files in indexing queue (without updates): 4
Queued documents (without files):array (
  'mail__mail' => 12,
)
Files successfully sent to backend: 0
Indexed documents: array (
)
Actions in queue: 0
File system events in queue: 0

I'm not sure how indexing is triggered so I execute the background jobs manually. Somehow this gets stuck for a very long time too

]$ php occ background-job:execute 10896
Job class:            OCA\ContextChat\BackgroundJobs\IndexerJob
Arguments:            {"storageId":1747,"rootId":38717}
Type:                 timed

Last checked:         2025-11-05T07:55:03+00:00
Reserved at:          2025-11-05T07:55:03+00:00
Last executed:        2025-11-05T07:55:03+00:00
Last duration:        0
Next execution:       2025-11-05T08:25:03+00:00

^ waiting 15 mins again.

Edit: an infinite loop in \OC\Files\Storage\Local::getSourcePath causes the job to never return

ChristophWurst avatar Nov 05 '25 09:11 ChristophWurst

For some reason the indexer job gets stuck somewhere in the filesystem code.

@marcelklehr what is the proper way to get the unindexed mail documents indexed?

Edit: must have been the stuck indexer jobs. Fixed them manually. Got to an "indexed" state. Yet it doesn't show any indexed documents

php occ context_chat:stats
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 0
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
)
Files successfully sent to backend: 0
Indexed documents: array (
)
Actions in queue: 0
File system events in queue: 0

… and fails with …

INFO:     127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
llama_perf_context_print:        load time =     949.62 ms
llama_perf_context_print: prompt eval time =     637.92 ms /     9 tokens (   70.88 ms per token,    14.11 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =     638.18 ms /    10 tokens
INFO:     ::1:54578 - "POST /v1/embeddings HTTP/1.1" 200 OK
2025-11-05T11:49:15+0000: [ERROR|utils]: original traceback: Traceback (most recent call last):
  File "/app/context_chat_backend/utils.py", line 74, in exception_wrap
    resconn.send({ 'value': fun(*args, **kwargs), 'error': None })
                            ^^^^^^^^^^^^^^^^^^^^
  File "/app/context_chat_backend/chain/one_shot.py", line 68, in process_context_query
    raise ContextException('No documents retrieved, please index a few documents first')
context_chat_backend.chain.types.ContextException: No documents retrieved, please index a few documents first

2025-11-05T11:49:15+0000: [ERROR|controller]: Context Retrieval Error: /query:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 75, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 302, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 215, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 38, in run_in_threadpool
    return await anyio.to_thread.run_sync(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2476, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/context_chat_backend/controller.py", line 191, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/context_chat_backend/controller.py", line 477, in _
    return execute_query(query)
           ^^^^^^^^^^^^^^^^^^^^
  File "/app/context_chat_backend/controller.py", line 466, in execute_query
    return exec_in_proc(target=target, args=args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/context_chat_backend/utils.py", line 97, in exec_in_proc
    raise result['error']
context_chat_backend.chain.types.ContextException: No documents retrieved, please index a few documents first
2025-11-05T11:49:15+0000: [ERROR|utils]: Failed request (400): Context Retrieval Error: No documents retrieved, please index a few documents first
INFO:     127.0.0.1:0 - "POST /query HTTP/1.0" 400 Bad Request

ChristophWurst avatar Nov 05 '25 09:11 ChristophWurst

Indexed documents: array ( )

weird, it doesn't show anything after indexing all the "mail__mail" docs. context chat has a lot of logs so it uses a separate logging file. are there any logs in the "data/context_chat.log*" files, maybe some errors about indexing or when the stats are fetched? Same for context_chat_backend, are there any error logs from when the mail docs were indexed? We can also check the db to see if the docs are actually there.

docker exec -it nc_app_context_chat_backend psql -U ccbuser -d ccb -p 5001
select * from docs;
\d

kyteinsky avatar Nov 06 '25 07:11 kyteinsky

$ docker exec -it nc_app_context_chat_backend psql -U ccbuser -d ccb -p 5001
psql (17.5 (Ubuntu 17.5-1.pgdg22.04+1))
Type "help" for help.

ccb=# select * from docs;
\d
       source_id       |    provider    |      modified       |                 chunks                 
-----------------------+----------------+---------------------+----------------------------------------
 files__default: 44495 | files__default | 2025-11-05 11:57:03 | {6bb8b530-d2bd-4856-9fda-bc35cd700b6a}
(1 row)

                   List of relations
 Schema |          Name           |   Type   |  Owner  
--------+-------------------------+----------+---------
 public | access_list             | table    | ccbuser
 public | access_list_id_seq      | sequence | ccbuser
 public | docs                    | table    | ccbuser
 public | langchain_pg_collection | table    | ccbuser
 public | langchain_pg_embedding  | table    | ccbuser
(5 rows)

^ it contains the one file my user has. the submit mail docs are not there?!

According to the CCB the sources where received successfully, however:

INFO:     127.0.0.1:0 - "POST /query HTTP/1.0" 500 Internal Server Error
INFO:     127.0.0.1:0 - "POST /countIndexedDocuments HTTP/1.0" 200 OK
INFO:     127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
INFO:     127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
INFO:     127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK
INFO:     127.0.0.1:0 - "POST /countIndexedDocuments HTTP/1.0" 200 OK
INFO:     127.0.0.1:0 - "PUT /loadSources HTTP/1.0" 200 OK

data/context_chat.log

{"reqId":"S1BEBD5O6ehNg7gTba16","level":3,"time":"2025-11-07T14:11:05+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Background jobs: OCA\\ContextChat\\BackgroundJobs\\SubmitContentJob 10961 ended","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"5hp3UffxkfWi4BkTFXFf","level":3,"time":"2025-11-07T14:11:11+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Background jobs: OCA\\ContextChat\\BackgroundJobs\\SubmitContentJob 10962 started","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"5hp3UffxkfWi4BkTFXFf","level":3,"time":"2025-11-07T14:11:16+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"[SubmitContentJob] Indexed sources for providers","userAgent":"--","version":"33.0.0.1","data":{"count":"0","loaded_sources":"[]","sources_to_retry":"[]"}}
{"reqId":"5hp3UffxkfWi4BkTFXFf","level":3,"time":"2025-11-07T14:11:16+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Background jobs: OCA\\ContextChat\\BackgroundJobs\\SubmitContentJob 10962 ended","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
{"reqId":"w40vJGdHn61pjWvtVB9S","level":3,"time":"2025-11-07T14:11:29+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Could not fetch storage root","userAgent":"--","version":"33.0.0.1","data":[]}
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 1
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
  'mail__mail' => 12,
)
Files successfully sent to backend: 1
Indexed documents: array (
  'files__default' => 1,
)
Actions in queue: 0
File system events in queue: 0
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 1
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
  'mail__mail' => 24,
)
Files successfully sent to backend: 1
Indexed documents: array (
  'files__default' => 1,
)
Actions in queue: 0
File system events in queue: 0

ChristophWurst avatar Nov 07 '25 14:11 ChristophWurst

With @kyteinsky's fix of CCB the Mail "documents" are no longer ignored and do fill the vector db:

occ context_chat:stats
ContextChat statistics:
Installed time: 2025-10-17 12:04 UTC
Index complete time: 2025-11-05 10:21 UTC
Total time taken for complete index: 18 days 22:17 (hh:mm)
Total eligible files: 1
Files in indexing queue: 0
New files in indexing queue (without updates): 0
Queued documents (without files):array (
  'mail__mail' => 6,
)
Files successfully sent to backend: 1
Indexed documents: array (
  'files__default' => 1,
  'mail__mail' => 11,
)
Actions in queue: 0
File system events in queue: 0

ChristophWurst avatar Nov 12 '25 14:11 ChristophWurst

Sending context chat queries works too. Results don't

{
  "reqId": "6Ulnm9L9uSvf4GrSUtG3",
  "level": 2,
  "time": "2025-11-13T08:29:34+00:00",
  "remoteAddr": "",
  "user": "--",
  "app": "no app in context",
  "method": "",
  "url": "--",
  "message": "A TaskProcessing context_chat:context_chat task with id 539 failed with the following message: Parameter \"id\" for route \"mail.page.thread\" must match \"[^/]++\" (\"\" given) to generate a corresponding URL.",
  "userAgent": "--",
  "version": "33.0.0.1",
  "data": []
}
  • Requires https://github.com/nextcloud/context_chat_backend/pull/232 (released, update CCB)
  • Requires https://github.com/nextcloud/context_chat/pull/190

@edward-ly here are the major todos:

  1. Fix the infinite indexing loop.
  2. Fix the route generation.
  3. Add an opt-in switch. We don't want the context chat integration to be on by default for now. Admins have to turn this on explicitly.

ChristophWurst avatar Nov 13 '25 08:11 ChristophWurst