H-2693: Implement fact gathering in the worker/coordinator agents of the research action
🌟 What is the purpose of this PR?
This PR refactors the research entities action's coordinator agent to gather facts from web pages via calls to a worker agent, before attempting to propose entities. This gives the coordinator agent the ability to propose entities with information obtained from more than one source.
It also reworks the inferFactsFromWebPageWorkerAgent to not propose any entities, and instead directly return the facts it discovered on a web page/linked web pages/linked PDFs.
🔗 Related links
- H-2693
🔍 What does this change?
- removes the
proposeAndSubmitLinktool from the coordinator agent. Outgoing links from discovered entities to existing entities can now be created via theproposeEntitiesFromFactstools, where existing entities are passed as possible target outgoing link entities for any discovered entity.
Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
- [x] does not modify any publishable blocks or libraries, or modifications do not need publishing
📜 Does this require a change to the docs?
The changes in this PR:
- [x] are internal and do not require a docs change
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
- [x] do not affect the execution graph
⚠️ Known issues
- the coordinator agent may use different sources to compile entities, but doesn't make a great effort to find all the facts needed to fill all the properties of an entity.
- for property values which are URLs (e.g. "Linked In URL"), there worker agent doesn't currently directly facts which specify what the URL may be as it is creating facts based on the content of the HTML. This needs to be addressed so when proposing an entity, the agent isn't guessing this based on other URLs it may have received (H-2744)
- the coordinator agent doesn't handle gathering facts for many entities very well. This could be improved by allowing it to create "sub-tasks" and passing those to new instances of the coordinator agent (H-2735)
🐾 Next steps
- Attach provenance information to properties on proposed entities (H-2743)
🛡 What tests cover this?
Manual testing.
❓ How to test this?
Run a flow with the research entities action, to test out its new capabilities.
📹 Demo
https://github.com/hashintel/hash/assets/42802102/5921f1ba-1e22-435b-9b73-cbf4118c2fa4
Codecov Report
Attention: Patch coverage is 0% with 202 lines in your changes missing coverage. Please review.
Project coverage is 0.84%. Comparing base (
4c0810c) to head (06d8626). Report is 2052 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #4497 +/- ##
==========================================
- Coverage 21.52% 0.84% -20.68%
==========================================
Files 451 248 -203
Lines 14953 6829 -8124
Branches 2216 1363 -853
==========================================
- Hits 3218 58 -3160
+ Misses 11694 6759 -4935
+ Partials 41 12 -29
| Flag | Coverage Δ | |
|---|---|---|
| apps.hash-ai-worker-py | ? |
|
| apps.hash-ai-worker-ts | 1.84% <0.00%> (-0.03%) |
:arrow_down: |
| apps.hash-api | 0.00% <0.00%> (ø) |
|
| backend-integration-tests | ? |
|
| blockprotocol.type-system | ? |
|
| deer | ? |
|
| error-stack | ? |
|
| local.hash-backend-utils | ? |
|
| local.hash-isomorphic-utils | ? |
|
| local.hash-subgraph | ? |
|
| sarif | ? |
|
| tests.hash-backend-integration | ? |
|
| unit-tests | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Benchmark results
@rust/graph-benches – Integrations
scaling_read_entity_complete_one_depth
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 50 entities | $$1.57 \mathrm{s} \pm 6.91 \mathrm{ms}\left({\color{red}469 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 10 entities | $$52.1 \mathrm{ms} \pm 177 \mathrm{μs}\left({\color{red}11.1 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 1 entities | $$21.7 \mathrm{ms} \pm 104 \mathrm{μs}\left({\color{gray}-0.944 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 25 entities | $$74.8 \mathrm{ms} \pm 320 \mathrm{μs}\left({\color{gray}-0.048 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 5 entities | $$26.2 \mathrm{ms} \pm 421 \mathrm{μs}\left({\color{gray}1.63 \mathrm{\%}}\right) $$ | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 |
$$17.1 \mathrm{ms} \pm 254 \mathrm{μs}\left({\color{gray}-4.671 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 |
$$16.6 \mathrm{ms} \pm 220 \mathrm{μs}\left({\color{red}6.20 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 |
$$17.6 \mathrm{ms} \pm 223 \mathrm{μs}\left({\color{gray}-0.819 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 |
$$18.5 \mathrm{ms} \pm 221 \mathrm{μs}\left({\color{red}12.7 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 |
$$17.0 \mathrm{ms} \pm 211 \mathrm{μs}\left({\color{gray}4.39 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 |
$$16.7 \mathrm{ms} \pm 213 \mathrm{μs}\left({\color{gray}0.664 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 |
$$17.4 \mathrm{ms} \pm 214 \mathrm{μs}\left({\color{red}9.34 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 |
$$18.0 \mathrm{ms} \pm 197 \mathrm{μs}\left({\color{red}5.77 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 |
$$18.8 \mathrm{ms} \pm 209 \mathrm{μs}\left({\color{red}12.4 \mathrm{\%}}\right) $$ | Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| link_by_source_by_property | depths: DT=255, PT=255, ET=255, E=255 | $$2.03 \mathrm{s} \pm 10.4 \mathrm{ms}\left({\color{gray}-0.512 \mathrm{\%}}\right) $$ | Flame Graph |
| link_by_source_by_property | depths: DT=2, PT=2, ET=2, E=2 | $$1.07 \mathrm{s} \pm 4.04 \mathrm{ms}\left({\color{gray}0.502 \mathrm{\%}}\right) $$ | Flame Graph |
| link_by_source_by_property | depths: DT=0, PT=2, ET=2, E=2 | $$1.08 \mathrm{s} \pm 6.61 \mathrm{ms}\left({\color{gray}1.59 \mathrm{\%}}\right) $$ | Flame Graph |
| link_by_source_by_property | depths: DT=0, PT=0, ET=0, E=2 | $$102 \mathrm{ms} \pm 622 \mathrm{μs}\left({\color{gray}2.45 \mathrm{\%}}\right) $$ | Flame Graph |
| link_by_source_by_property | depths: DT=0, PT=0, ET=2, E=2 | $$442 \mathrm{ms} \pm 1.98 \mathrm{ms}\left({\color{gray}4.58 \mathrm{\%}}\right) $$ | Flame Graph |
| link_by_source_by_property | depths: DT=0, PT=0, ET=0, E=0 | $$64.6 \mathrm{ms} \pm 292 \mathrm{μs}\left({\color{gray}2.01 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_property | depths: DT=255, PT=255, ET=255, E=255 | $$2.95 \mathrm{s} \pm 13.2 \mathrm{ms}\left({\color{gray}1.44 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_property | depths: DT=2, PT=2, ET=2, E=2 | $$1.00 \mathrm{s} \pm 5.66 \mathrm{ms}\left({\color{gray}0.294 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_property | depths: DT=0, PT=2, ET=2, E=2 | $$998 \mathrm{ms} \pm 4.65 \mathrm{ms}\left({\color{gray}-1.650 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_property | depths: DT=0, PT=0, ET=0, E=2 | $$42.3 \mathrm{ms} \pm 248 \mathrm{μs}\left({\color{gray}0.945 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_property | depths: DT=0, PT=0, ET=2, E=2 | $$367 \mathrm{ms} \pm 1.63 \mathrm{ms}\left({\color{gray}-2.189 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_property | depths: DT=0, PT=0, ET=0, E=0 | $$37.6 \mathrm{ms} \pm 159 \mathrm{μs}\left({\color{gray}0.980 \mathrm{\%}}\right) $$ | Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 |
$$1.42 \mathrm{ms} \pm 4.39 \mathrm{μs}\left({\color{gray}-0.634 \mathrm{\%}}\right) $$ | Flame Graph |
scaling_read_entity_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1000 entities | $$3.42 \mathrm{ms} \pm 20.4 \mathrm{μs}\left({\color{gray}0.399 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 100 entities | $$2.67 \mathrm{ms} \pm 11.3 \mathrm{μs}\left({\color{gray}2.38 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 10 entities | $$2.49 \mathrm{ms} \pm 13.9 \mathrm{μs}\left({\color{gray}0.165 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 10000 entities | $$14.0 \mathrm{ms} \pm 147 \mathrm{μs}\left({\color{gray}4.14 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 1 entities | $$2.49 \mathrm{ms} \pm 9.79 \mathrm{μs}\left({\color{gray}0.015 \mathrm{\%}}\right) $$ | Flame Graph |
scaling_read_entity_complete_zero_depth
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 50 entities | $$4.53 \mathrm{ms} \pm 26.6 \mathrm{μs}\left({\color{gray}-0.880 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 10 entities | $$2.73 \mathrm{ms} \pm 14.2 \mathrm{μs}\left({\color{gray}-1.246 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 1 entities | $$2.53 \mathrm{ms} \pm 11.8 \mathrm{μs}\left({\color{gray}-0.945 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 25 entities | $$3.20 \mathrm{ms} \pm 37.0 \mathrm{μs}\left({\color{gray}-0.380 \mathrm{\%}}\right) $$ | Flame Graph |
| entity_by_id | 5 entities | $$2.56 \mathrm{ms} \pm 15.4 \mathrm{μs}\left({\color{gray}-1.192 \mathrm{\%}}\right) $$ | Flame Graph |