hash icon indicating copy to clipboard operation
hash copied to clipboard

H-2693: Implement fact gathering in the worker/coordinator agents of the research action

Open benwerner01 opened this issue 1 year ago • 2 comments

🌟 What is the purpose of this PR?

This PR refactors the research entities action's coordinator agent to gather facts from web pages via calls to a worker agent, before attempting to propose entities. This gives the coordinator agent the ability to propose entities with information obtained from more than one source.

It also reworks the inferFactsFromWebPageWorkerAgent to not propose any entities, and instead directly return the facts it discovered on a web page/linked web pages/linked PDFs.

🔗 Related links

  • H-2693

🔍 What does this change?

  • removes the proposeAndSubmitLink tool from the coordinator agent. Outgoing links from discovered entities to existing entities can now be created via the proposeEntitiesFromFacts tools, where existing entities are passed as possible target outgoing link entities for any discovered entity.

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • [x] does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • [x] are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • [x] do not affect the execution graph

⚠️ Known issues

  • the coordinator agent may use different sources to compile entities, but doesn't make a great effort to find all the facts needed to fill all the properties of an entity.
  • for property values which are URLs (e.g. "Linked In URL"), there worker agent doesn't currently directly facts which specify what the URL may be as it is creating facts based on the content of the HTML. This needs to be addressed so when proposing an entity, the agent isn't guessing this based on other URLs it may have received (H-2744)
  • the coordinator agent doesn't handle gathering facts for many entities very well. This could be improved by allowing it to create "sub-tasks" and passing those to new instances of the coordinator agent (H-2735)

🐾 Next steps

  • Attach provenance information to properties on proposed entities (H-2743)

🛡 What tests cover this?

Manual testing.

❓ How to test this?

Run a flow with the research entities action, to test out its new capabilities.

📹 Demo

https://github.com/hashintel/hash/assets/42802102/5921f1ba-1e22-435b-9b73-cbf4118c2fa4

benwerner01 avatar May 21 '24 12:05 benwerner01

Codecov Report

Attention: Patch coverage is 0% with 202 lines in your changes missing coverage. Please review.

Project coverage is 0.84%. Comparing base (4c0810c) to head (06d8626). Report is 2052 commits behind head on main.

Files with missing lines Patch % Lines
...vities/flow-activities/research-entities-action.ts 0.00% 69 Missing :warning:
...s-action/infer-facts-from-web-page-worker-agent.ts 0.00% 26 Missing :warning:
...s/research-entities-action/deduplicate-entities.ts 0.00% 22 Missing :warning:
...rch-entities-action/summarize-existing-entities.ts 0.00% 22 Missing :warning:
...ies/research-entities-action/coordinating-agent.ts 0.00% 11 Missing :warning:
...orker-ts/src/activities/shared/get-llm-response.ts 0.00% 8 Missing :warning:
...previously-inferred-facts-system-prompt-message.ts 0.00% 7 Missing :warning:
...-worker-ts/src/activities/get-web-page-activity.ts 0.00% 7 Missing :warning:
...-facts-from-text/get-entity-summaries-from-text.ts 0.00% 6 Missing :warning:
...w-activities/shared/propose-entities-from-facts.ts 0.00% 6 Missing :warning:
... and 6 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #4497       +/-   ##
==========================================
- Coverage   21.52%   0.84%   -20.68%     
==========================================
  Files         451     248      -203     
  Lines       14953    6829     -8124     
  Branches     2216    1363      -853     
==========================================
- Hits         3218      58     -3160     
+ Misses      11694    6759     -4935     
+ Partials       41      12       -29     
Flag Coverage Δ
apps.hash-ai-worker-py ?
apps.hash-ai-worker-ts 1.84% <0.00%> (-0.03%) :arrow_down:
apps.hash-api 0.00% <0.00%> (ø)
backend-integration-tests ?
blockprotocol.type-system ?
deer ?
error-stack ?
local.hash-backend-utils ?
local.hash-isomorphic-utils ?
local.hash-subgraph ?
sarif ?
tests.hash-backend-integration ?
unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar May 21 '24 13:05 codecov[bot]

Benchmark results

@rust/graph-benches – Integrations

scaling_read_entity_complete_one_depth

Function Value Mean Flame graphs
entity_by_id 50 entities $$1.57 \mathrm{s} \pm 6.91 \mathrm{ms}\left({\color{red}469 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$52.1 \mathrm{ms} \pm 177 \mathrm{μs}\left({\color{red}11.1 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$21.7 \mathrm{ms} \pm 104 \mathrm{μs}\left({\color{gray}-0.944 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 25 entities $$74.8 \mathrm{ms} \pm 320 \mathrm{μs}\left({\color{gray}-0.048 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 5 entities $$26.2 \mathrm{ms} \pm 421 \mathrm{μs}\left({\color{gray}1.63 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity

Function Value Mean Flame graphs
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$17.1 \mathrm{ms} \pm 254 \mathrm{μs}\left({\color{gray}-4.671 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$16.6 \mathrm{ms} \pm 220 \mathrm{μs}\left({\color{red}6.20 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$17.6 \mathrm{ms} \pm 223 \mathrm{μs}\left({\color{gray}-0.819 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$18.5 \mathrm{ms} \pm 221 \mathrm{μs}\left({\color{red}12.7 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$17.0 \mathrm{ms} \pm 211 \mathrm{μs}\left({\color{gray}4.39 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$16.7 \mathrm{ms} \pm 213 \mathrm{μs}\left({\color{gray}0.664 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$17.4 \mathrm{ms} \pm 214 \mathrm{μs}\left({\color{red}9.34 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$18.0 \mathrm{ms} \pm 197 \mathrm{μs}\left({\color{red}5.77 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$18.8 \mathrm{ms} \pm 209 \mathrm{μs}\left({\color{red}12.4 \mathrm{\%}}\right) $$ Flame Graph

representative_read_multiple_entities

Function Value Mean Flame graphs
link_by_source_by_property depths: DT=255, PT=255, ET=255, E=255 $$2.03 \mathrm{s} \pm 10.4 \mathrm{ms}\left({\color{gray}-0.512 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=2, PT=2, ET=2, E=2 $$1.07 \mathrm{s} \pm 4.04 \mathrm{ms}\left({\color{gray}0.502 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=2, ET=2, E=2 $$1.08 \mathrm{s} \pm 6.61 \mathrm{ms}\left({\color{gray}1.59 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=2 $$102 \mathrm{ms} \pm 622 \mathrm{μs}\left({\color{gray}2.45 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=2, E=2 $$442 \mathrm{ms} \pm 1.98 \mathrm{ms}\left({\color{gray}4.58 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=0 $$64.6 \mathrm{ms} \pm 292 \mathrm{μs}\left({\color{gray}2.01 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=255, PT=255, ET=255, E=255 $$2.95 \mathrm{s} \pm 13.2 \mathrm{ms}\left({\color{gray}1.44 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=2, PT=2, ET=2, E=2 $$1.00 \mathrm{s} \pm 5.66 \mathrm{ms}\left({\color{gray}0.294 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=2, ET=2, E=2 $$998 \mathrm{ms} \pm 4.65 \mathrm{ms}\left({\color{gray}-1.650 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=0, E=2 $$42.3 \mathrm{ms} \pm 248 \mathrm{μs}\left({\color{gray}0.945 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=2, E=2 $$367 \mathrm{ms} \pm 1.63 \mathrm{ms}\left({\color{gray}-2.189 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=0, E=0 $$37.6 \mathrm{ms} \pm 159 \mathrm{μs}\left({\color{gray}0.980 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity_type

Function Value Mean Flame graphs
get_entity_type_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 $$1.42 \mathrm{ms} \pm 4.39 \mathrm{μs}\left({\color{gray}-0.634 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_linkless

Function Value Mean Flame graphs
entity_by_id 1000 entities $$3.42 \mathrm{ms} \pm 20.4 \mathrm{μs}\left({\color{gray}0.399 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 100 entities $$2.67 \mathrm{ms} \pm 11.3 \mathrm{μs}\left({\color{gray}2.38 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$2.49 \mathrm{ms} \pm 13.9 \mathrm{μs}\left({\color{gray}0.165 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10000 entities $$14.0 \mathrm{ms} \pm 147 \mathrm{μs}\left({\color{gray}4.14 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$2.49 \mathrm{ms} \pm 9.79 \mathrm{μs}\left({\color{gray}0.015 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_complete_zero_depth

Function Value Mean Flame graphs
entity_by_id 50 entities $$4.53 \mathrm{ms} \pm 26.6 \mathrm{μs}\left({\color{gray}-0.880 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$2.73 \mathrm{ms} \pm 14.2 \mathrm{μs}\left({\color{gray}-1.246 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$2.53 \mathrm{ms} \pm 11.8 \mathrm{μs}\left({\color{gray}-0.945 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 25 entities $$3.20 \mathrm{ms} \pm 37.0 \mathrm{μs}\left({\color{gray}-0.380 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 5 entities $$2.56 \mathrm{ms} \pm 15.4 \mathrm{μs}\left({\color{gray}-1.192 \mathrm{\%}}\right) $$ Flame Graph

github-actions[bot] avatar May 22 '24 16:05 github-actions[bot]