H-2692: Infer facts from text before proposing entities
🌟 What is the purpose of this PR?
This PR modifies how entities are proposed in the research action, and stops making use of the inferEntitiesFromContent action to propose entities. The process of proposing entities in the worker agent is now:
- Summarise all relevant entities in the text provided
- Infer facts from the text, which have a subject, predicate, and singular object
- For each summarised entity, propose the entity and its outgoing links based on the facts which have the entity as their "subject"
In follow up we aim to use the underlying pieces of this process to no longer propose entities when processing a single piece of text. Instead we will gather all the facts from different sources on the coordinator level, so that entities can be proposed based on information obtained from a variety of sources.
🔗 Related links
- H-2692
🔍 What does this change?
- adds mocks for the temporal functionality needed to run flow steps methods in the
vitesttesting library
Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
- [x] does not modify any publishable blocks or libraries, or modifications do not need publishing
📜 Does this require a change to the docs?
The changes in this PR:
- [x] are internal and do not require a docs change
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
- [x] do not affect the execution graph
⚠️ Known issues
- This PR breaks the functionality of proposed entities being able to link to existing entities passed to the research action. This will be partially addressed when the remaining work on moving the fact gathering to the coordinator level takes place, as we can incorporate existing entities in the required fact deduplication work (H-2693). Ideally we will also make the fact inference methods aware of existing methods (H-2713).
- We will need to add additional fields to the facts so that provenance information is captured. This is not yet required for this PR, as we can determine the provenance data as we would have previously as all properties are being derived from a single source.
🐾 Next steps
- gather facts at the coordinator level from multiple sources, before proposing the entities (H-2693)
- Add ability to specify existingEntities when inferring facts, so that these can be directly linked from new proposed entities (H-2713)
🛡 What tests cover this?
Manual testing
❓ How to test this?
Try out the existing flows that make use of the research action. I've used the "Get subsidiary companies of Google" as a prompt and the Company flow test type to produce the demoed result.
📹 Demo
Codecov Report
Attention: Patch coverage is 0% with 217 lines in your changes missing coverage. Please review.
Project coverage is 20.83%. Comparing base (
19e6e65) to head (2c18ba4). Report is 2082 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #4467 +/- ##
==========================================
- Coverage 21.10% 20.83% -0.27%
==========================================
Files 449 456 +7
Lines 15247 15443 +196
Branches 2275 2316 +41
==========================================
Hits 3218 3218
- Misses 11988 12184 +196
Partials 41 41
| Flag | Coverage Δ | |
|---|---|---|
| apps.hash-ai-worker-ts | 1.69% <0.00%> (-0.11%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Benchmark results
@rust/graph-benches – Integrations
scaling_read_entity_complete_one_depth
| Function | Value | Mean |
|---|---|---|
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 5 |
$$24.4 \mathrm{ms} \pm 276 \mathrm{μs}\left({\color{gray}0.397 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 50 |
$$255 \mathrm{ms} \pm 1.54 \mathrm{ms}\left({\color{gray}-2.188 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 |
$$45.5 \mathrm{ms} \pm 2.55 \mathrm{ms}\left({\color{red}48.9 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 25 |
$$69.6 \mathrm{ms} \pm 485 \mathrm{μs}\left({\color{gray}-3.468 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 |
$$20.4 \mathrm{ms} \pm 95.4 \mathrm{μs}\left({\color{gray}-0.909 \mathrm{\%}}\right) $$ |
representative_read_entity
| Function | Value | Mean |
|---|---|---|
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 |
$$16.1 \mathrm{ms} \pm 189 \mathrm{μs}\left({\color{gray}-0.448 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 |
$$16.5 \mathrm{ms} \pm 185 \mathrm{μs}\left({\color{gray}-4.006 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 |
$$16.2 \mathrm{ms} \pm 189 \mathrm{μs}\left({\color{gray}1.94 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 |
$$16.7 \mathrm{ms} \pm 187 \mathrm{μs}\left({\color{gray}0.484 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 |
$$17.3 \mathrm{ms} \pm 198 \mathrm{μs}\left({\color{lightgreen}-32.658 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 |
$$16.8 \mathrm{ms} \pm 213 \mathrm{μs}\left({\color{gray}0.506 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 |
$$16.5 \mathrm{ms} \pm 186 \mathrm{μs}\left({\color{gray}1.22 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 |
$$15.9 \mathrm{ms} \pm 157 \mathrm{μs}\left({\color{gray}-0.071 \mathrm{\%}}\right) $$ |
| entity_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 |
$$16.7 \mathrm{ms} \pm 169 \mathrm{μs}\left({\color{gray}2.85 \mathrm{\%}}\right) $$ |
representative_read_multiple_entities
| Function | Value | Mean |
|---|---|---|
| link_by_source_by_property | depths: DT=255, PT=255, ET=255, E=255 | $$1.98 \mathrm{s} \pm 8.08 \mathrm{ms}\left({\color{gray}-0.737 \mathrm{\%}}\right) $$ |
| link_by_source_by_property | depths: DT=2, PT=2, ET=2, E=2 | $$1.05 \mathrm{s} \pm 3.57 \mathrm{ms}\left({\color{gray}0.515 \mathrm{\%}}\right) $$ |
| link_by_source_by_property | depths: DT=0, PT=2, ET=2, E=2 | $$1.05 \mathrm{s} \pm 6.96 \mathrm{ms}\left({\color{gray}-0.038 \mathrm{\%}}\right) $$ |
| link_by_source_by_property | depths: DT=0, PT=0, ET=0, E=2 | $$95.7 \mathrm{ms} \pm 559 \mathrm{μs}\left({\color{gray}-0.172 \mathrm{\%}}\right) $$ |
| link_by_source_by_property | depths: DT=0, PT=0, ET=2, E=2 | $$418 \mathrm{ms} \pm 1.31 \mathrm{ms}\left({\color{gray}0.233 \mathrm{\%}}\right) $$ |
| link_by_source_by_property | depths: DT=0, PT=0, ET=0, E=0 | $$60.2 \mathrm{ms} \pm 372 \mathrm{μs}\left({\color{gray}-0.088 \mathrm{\%}}\right) $$ |
| entity_by_property | depths: DT=255, PT=255, ET=255, E=255 | $$2.87 \mathrm{s} \pm 6.72 \mathrm{ms}\left({\color{gray}0.240 \mathrm{\%}}\right) $$ |
| entity_by_property | depths: DT=2, PT=2, ET=2, E=2 | $$974 \mathrm{ms} \pm 4.96 \mathrm{ms}\left({\color{gray}-0.631 \mathrm{\%}}\right) $$ |
| entity_by_property | depths: DT=0, PT=2, ET=2, E=2 | $$965 \mathrm{ms} \pm 3.13 \mathrm{ms}\left({\color{gray}-2.832 \mathrm{\%}}\right) $$ |
| entity_by_property | depths: DT=0, PT=0, ET=0, E=2 | $$39.7 \mathrm{ms} \pm 220 \mathrm{μs}\left({\color{gray}-1.224 \mathrm{\%}}\right) $$ |
| entity_by_property | depths: DT=0, PT=0, ET=2, E=2 | $$355 \mathrm{ms} \pm 1.96 \mathrm{ms}\left({\color{gray}-2.990 \mathrm{\%}}\right) $$ |
| entity_by_property | depths: DT=0, PT=0, ET=0, E=0 | $$35.9 \mathrm{ms} \pm 153 \mathrm{μs}\left({\color{gray}-0.392 \mathrm{\%}}\right) $$ |
representative_read_entity_type
| Function | Value | Mean |
|---|---|---|
| get_entity_type_by_id | Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 |
$$1.35 \mathrm{ms} \pm 5.32 \mathrm{μs}\left({\color{gray}-1.499 \mathrm{\%}}\right) $$ |
scaling_read_entity_linkless
| Function | Value | Mean |
|---|---|---|
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 |
$$2.39 \mathrm{ms} \pm 10.9 \mathrm{μs}\left({\color{gray}0.921 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10000 |
$$13.5 \mathrm{ms} \pm 124 \mathrm{μs}\left({\color{gray}0.377 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 100 |
$$2.55 \mathrm{ms} \pm 14.6 \mathrm{μs}\left({\color{gray}0.276 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1000 |
$$3.26 \mathrm{ms} \pm 21.4 \mathrm{μs}\left({\color{gray}1.62 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 |
$$2.39 \mathrm{ms} \pm 7.59 \mathrm{μs}\left({\color{gray}0.279 \mathrm{\%}}\right) $$ |
scaling_read_entity_complete_zero_depth
| Function | Value | Mean |
|---|---|---|
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 5 |
$$2.43 \mathrm{ms} \pm 13.2 \mathrm{μs}\left({\color{gray}-1.434 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 50 |
$$4.43 \mathrm{ms} \pm 20.5 \mathrm{μs}\left({\color{gray}1.72 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 |
$$2.63 \mathrm{ms} \pm 18.2 \mathrm{μs}\left({\color{gray}0.041 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 25 |
$$3.06 \mathrm{ms} \pm 12.2 \mathrm{μs}\left({\color{gray}-1.401 \mathrm{\%}}\right) $$ |
| get_entity_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 |
$$2.40 \mathrm{ms} \pm 9.30 \mathrm{μs}\left({\color{gray}-0.121 \mathrm{\%}}\right) $$ |