hash icon indicating copy to clipboard operation
hash copied to clipboard

H-2692: Infer facts from text before proposing entities

Open benwerner01 opened this issue 1 year ago • 2 comments

🌟 What is the purpose of this PR?

This PR modifies how entities are proposed in the research action, and stops making use of the inferEntitiesFromContent action to propose entities. The process of proposing entities in the worker agent is now:

  1. Summarise all relevant entities in the text provided
  2. Infer facts from the text, which have a subject, predicate, and singular object
  3. For each summarised entity, propose the entity and its outgoing links based on the facts which have the entity as their "subject"

In follow up we aim to use the underlying pieces of this process to no longer propose entities when processing a single piece of text. Instead we will gather all the facts from different sources on the coordinator level, so that entities can be proposed based on information obtained from a variety of sources.

🔗 Related links

  • H-2692

🔍 What does this change?

  • adds mocks for the temporal functionality needed to run flow steps methods in the vitest testing library

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • [x] does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • [x] are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • [x] do not affect the execution graph

⚠️ Known issues

  • This PR breaks the functionality of proposed entities being able to link to existing entities passed to the research action. This will be partially addressed when the remaining work on moving the fact gathering to the coordinator level takes place, as we can incorporate existing entities in the required fact deduplication work (H-2693). Ideally we will also make the fact inference methods aware of existing methods (H-2713).
  • We will need to add additional fields to the facts so that provenance information is captured. This is not yet required for this PR, as we can determine the provenance data as we would have previously as all properties are being derived from a single source.

🐾 Next steps

  • gather facts at the coordinator level from multiple sources, before proposing the entities (H-2693)
  • Add ability to specify existingEntities when inferring facts, so that these can be directly linked from new proposed entities (H-2713)

🛡 What tests cover this?

Manual testing

❓ How to test this?

Try out the existing flows that make use of the research action. I've used the "Get subsidiary companies of Google" as a prompt and the Company flow test type to produce the demoed result.

📹 Demo

image

benwerner01 avatar May 15 '24 18:05 benwerner01

Codecov Report

Attention: Patch coverage is 0% with 217 lines in your changes missing coverage. Please review.

Project coverage is 20.83%. Comparing base (19e6e65) to head (2c18ba4). Report is 2082 commits behind head on main.

Files with missing lines Patch % Lines
...e-entities-from-facts/propose-entity-from-facts.ts 0.00% 66 Missing :warning:
...er-facts-from-text/infer-entity-facts-from-text.ts 0.00% 48 Missing :warning:
...w-activities/shared/propose-entities-from-facts.ts 0.00% 28 Missing :warning:
...-facts-from-text/get-entity-summaries-from-text.ts 0.00% 21 Missing :warning:
.../shared/testing-utilities/mock-get-flow-context.ts 0.00% 18 Missing :warning:
...es/flow-activities/shared/infer-facts-from-text.ts 0.00% 17 Missing :warning:
...red/testing-utilities/get-alice-user-account-id.ts 0.00% 8 Missing :warning:
...ction/infer-entities-from-web-page-worker-agent.ts 0.00% 6 Missing :warning:
...worker-ts/src/activities/shared/activity-logger.ts 0.00% 4 Missing :warning:
...sh-ai-worker-ts/src/activities/shared/stringify.ts 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4467      +/-   ##
==========================================
- Coverage   21.10%   20.83%   -0.27%     
==========================================
  Files         449      456       +7     
  Lines       15247    15443     +196     
  Branches     2275     2316      +41     
==========================================
  Hits         3218     3218              
- Misses      11988    12184     +196     
  Partials       41       41              
Flag Coverage Δ
apps.hash-ai-worker-ts 1.69% <0.00%> (-0.11%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar May 15 '24 18:05 codecov[bot]

Benchmark results

@rust/graph-benches – Integrations

scaling_read_entity_complete_one_depth

Function Value Mean
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 5 $$24.4 \mathrm{ms} \pm 276 \mathrm{μs}\left({\color{gray}0.397 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 50 $$255 \mathrm{ms} \pm 1.54 \mathrm{ms}\left({\color{gray}-2.188 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 $$45.5 \mathrm{ms} \pm 2.55 \mathrm{ms}\left({\color{red}48.9 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 25 $$69.6 \mathrm{ms} \pm 485 \mathrm{μs}\left({\color{gray}-3.468 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 $$20.4 \mathrm{ms} \pm 95.4 \mathrm{μs}\left({\color{gray}-0.909 \mathrm{\%}}\right) $$

representative_read_entity

Function Value Mean
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$16.1 \mathrm{ms} \pm 189 \mathrm{μs}\left({\color{gray}-0.448 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$16.5 \mathrm{ms} \pm 185 \mathrm{μs}\left({\color{gray}-4.006 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$16.2 \mathrm{ms} \pm 189 \mathrm{μs}\left({\color{gray}1.94 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$16.7 \mathrm{ms} \pm 187 \mathrm{μs}\left({\color{gray}0.484 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$17.3 \mathrm{ms} \pm 198 \mathrm{μs}\left({\color{lightgreen}-32.658 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$16.8 \mathrm{ms} \pm 213 \mathrm{μs}\left({\color{gray}0.506 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$16.5 \mathrm{ms} \pm 186 \mathrm{μs}\left({\color{gray}1.22 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$15.9 \mathrm{ms} \pm 157 \mathrm{μs}\left({\color{gray}-0.071 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$16.7 \mathrm{ms} \pm 169 \mathrm{μs}\left({\color{gray}2.85 \mathrm{\%}}\right) $$

representative_read_multiple_entities

Function Value Mean
link_by_source_by_property depths: DT=255, PT=255, ET=255, E=255 $$1.98 \mathrm{s} \pm 8.08 \mathrm{ms}\left({\color{gray}-0.737 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=2, PT=2, ET=2, E=2 $$1.05 \mathrm{s} \pm 3.57 \mathrm{ms}\left({\color{gray}0.515 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=2, ET=2, E=2 $$1.05 \mathrm{s} \pm 6.96 \mathrm{ms}\left({\color{gray}-0.038 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=2 $$95.7 \mathrm{ms} \pm 559 \mathrm{μs}\left({\color{gray}-0.172 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=0, ET=2, E=2 $$418 \mathrm{ms} \pm 1.31 \mathrm{ms}\left({\color{gray}0.233 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=0 $$60.2 \mathrm{ms} \pm 372 \mathrm{μs}\left({\color{gray}-0.088 \mathrm{\%}}\right) $$
entity_by_property depths: DT=255, PT=255, ET=255, E=255 $$2.87 \mathrm{s} \pm 6.72 \mathrm{ms}\left({\color{gray}0.240 \mathrm{\%}}\right) $$
entity_by_property depths: DT=2, PT=2, ET=2, E=2 $$974 \mathrm{ms} \pm 4.96 \mathrm{ms}\left({\color{gray}-0.631 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=2, ET=2, E=2 $$965 \mathrm{ms} \pm 3.13 \mathrm{ms}\left({\color{gray}-2.832 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=0, ET=0, E=2 $$39.7 \mathrm{ms} \pm 220 \mathrm{μs}\left({\color{gray}-1.224 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=0, ET=2, E=2 $$355 \mathrm{ms} \pm 1.96 \mathrm{ms}\left({\color{gray}-2.990 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=0, ET=0, E=0 $$35.9 \mathrm{ms} \pm 153 \mathrm{μs}\left({\color{gray}-0.392 \mathrm{\%}}\right) $$

representative_read_entity_type

Function Value Mean
get_entity_type_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 $$1.35 \mathrm{ms} \pm 5.32 \mathrm{μs}\left({\color{gray}-1.499 \mathrm{\%}}\right) $$

scaling_read_entity_linkless

Function Value Mean
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 $$2.39 \mathrm{ms} \pm 10.9 \mathrm{μs}\left({\color{gray}0.921 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10000 $$13.5 \mathrm{ms} \pm 124 \mathrm{μs}\left({\color{gray}0.377 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 100 $$2.55 \mathrm{ms} \pm 14.6 \mathrm{μs}\left({\color{gray}0.276 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1000 $$3.26 \mathrm{ms} \pm 21.4 \mathrm{μs}\left({\color{gray}1.62 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 $$2.39 \mathrm{ms} \pm 7.59 \mathrm{μs}\left({\color{gray}0.279 \mathrm{\%}}\right) $$

scaling_read_entity_complete_zero_depth

Function Value Mean
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 5 $$2.43 \mathrm{ms} \pm 13.2 \mathrm{μs}\left({\color{gray}-1.434 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 50 $$4.43 \mathrm{ms} \pm 20.5 \mathrm{μs}\left({\color{gray}1.72 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 $$2.63 \mathrm{ms} \pm 18.2 \mathrm{μs}\left({\color{gray}0.041 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 25 $$3.06 \mathrm{ms} \pm 12.2 \mathrm{μs}\left({\color{gray}-1.401 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 $$2.40 \mathrm{ms} \pm 9.30 \mathrm{μs}\left({\color{gray}-0.121 \mathrm{\%}}\right) $$

github-actions[bot] avatar May 15 '24 21:05 github-actions[bot]