hash icon indicating copy to clipboard operation
hash copied to clipboard

H-2744: Improve fact creation process + other fixes

Open benwerner01 opened this issue 1 year ago β€’ 1 comments

🌟 What is the purpose of this PR?

Previously facts were being missed from web pages for the purpose of providing URL values to properties (e.g. LinkedIn URL, GitHub URL). This PR makes several improvements to how facts are generated from text:

  • switches to gpt-4o in the inferEntityFactsFromText method
  • simplifies the user message in the inferEntityFactsFromText method to more clearly indicated what the relevant properties/links are that the user is looking for (this helped generate facts about what the LinkedIn URL of a user is, when inferring facts from a linked in page for example)
  • infers facts for multiple entities of the same type at the same type (chunked in sets of 5), to reduce the risk of running into the OpenAI rate limit with large amounts of entities
  • addresses an issue where the summary agent was overfitting entities to types

Other changes made:

  • sanitizes the HTML for LLM consumption when generating the summary of a web page
  • hides the "complete" tool from the coordinator agent until it has submitted proposed entities (sometimes it would call "complete" before proposing and submitting enities)
  • ensure the questions array isn't empty when the requestHumanInput tool is called
  • changes the inferFactsFromWebPage coordinator tool to become inferFactsFromWebPages, to encourage it to infer facts from more than one source

πŸ”— Related links

  • H-2744

πŸ” What does this change?

See description.

Pre-Merge Checklist πŸš€

🚒 Has this modified a publishable library?

This PR:

  • [x] does not modify any publishable blocks or libraries, or modifications do not need publishing

πŸ“œ Does this require a change to the docs?

The changes in this PR:

  • [x] are internal and do not require a docs change

πŸ•ΈοΈ Does this require a change to the Turbo Graph?

The changes in this PR:

  • [x] do not affect the execution graph

⚠️ Known issues

🐾 Next steps

πŸ›‘ What tests cover this?

Manual testing.

❓ How to test this?

Use the UI to try and obtain a Person entity via the research entities flow, to see if more properties on the test entity type are being filled (including things like LinkedIn URL, GitHub URL, etc.

You can also run one of the tests added in this PR via:

yarn workspace @apps/hash-ai-worker-ts vitest run research-entities-action.ai.test.ts

πŸ“Ή Demo

benwerner01 avatar May 23 '24 16:05 benwerner01

Codecov Report

Attention: Patch coverage is 0% with 141 lines in your changes missing coverage. Please review.

Project coverage is 20.73%. Comparing base (dd10a16) to head (9cb2cf4). Report is 1970 commits behind head on main.

Files with missing lines Patch % Lines
...vities/flow-activities/research-entities-action.ts 0.00% 30 Missing :warning:
...ies/research-entities-action/coordinating-agent.ts 0.00% 24 Missing :warning:
...es/flow-activities/shared/infer-facts-from-text.ts 0.00% 23 Missing :warning:
...pps/hash-api/src/seed-data/seed-flow-test-types.ts 0.00% 22 Missing :warning:
...er-facts-from-text/infer-entity-facts-from-text.ts 0.00% 21 Missing :warning:
...-worker-ts/src/activities/get-web-page-activity.ts 0.00% 7 Missing :warning:
...-facts-from-text/get-entity-summaries-from-text.ts 0.00% 4 Missing :warning:
...ns/006-create-first-custom-data-types.migration.ts 0.00% 3 Missing :warning:
...s-action/infer-facts-from-web-page-worker-agent.ts 0.00% 2 Missing :warning:
...ies/flow-activities/get-web-page-summary-action.ts 0.00% 1 Missing :warning:
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4505      +/-   ##
==========================================
- Coverage   20.88%   20.73%   -0.15%     
==========================================
  Files         461      461              
  Lines       15399    15492      +93     
  Branches     2320     2338      +18     
==========================================
- Hits         3216     3213       -3     
- Misses      12142    12238      +96     
  Partials       41       41              
Flag Coverage Ξ”
apps.hash-api 0.00% <0.00%> (ΓΈ)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

πŸš€ New features to boost your workflow:
  • ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jun 06 '24 12:06 codecov[bot]