langflow icon indicating copy to clipboard operation
langflow copied to clipboard

feat: addition of Gliner and Keybert link extraction components

Open pedrocassalpacheco opened this issue 1 year ago • 2 comments

This pull request introduces two new link extraction components to facilitate the creation of content graphs from text.

Components

  1. Gliner Link Extraction

Gline link extraction takes unstructured text, optionally splits the text into paragraphs, and uses a GLiNER model to perform entity recognition and turn it into Links consumable by the graph vector datastore.

2 Keybert Link Extraction

Keybert link extraction takes unstructured text, optionally splits the text into paragraphs, and uses key word extraction via keybert to extract keywords and transform them into links consumable by the graph vector store

pedrocassalpacheco avatar Dec 24 '24 00:12 pedrocassalpacheco

This is already discussed in #3866 and #3867. Also it seems these extractors should extend LCDocumentTransformerComponent

cbornet avatar Dec 24 '24 08:12 cbornet

@cbornet - by "already discussed," do you mean it has already been done? If so, why wasn't it merged? Not all extractors were built using the same pattern, and using LCDocumentTransformerComponent as a base class requires modifications to the extractor classes, which is currently out of scope. Langchain won't allow further commits to the graph vector, so a simpler approach of using Component as a base class seemed like a good solution. I am happy to close this out if it is redundant. Cheers ...

PS: @cbornet - I see the objection raised by @ogabrielluiz. If this is truly a problem, we should wait for the link strategy to be redesigned.

pedrocassalpacheco avatar Dec 24 '24 16:12 pedrocassalpacheco

CodSpeed Performance Report

Merging #5416 will degrade performances by 17%

Comparing pedrocassalpacheco:extractors (8087c71) with main (7dce8cd)

Summary

⚡ 2 improvements
❌ 1 regressions
✅ 12 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main pedrocassalpacheco:extractors Change
test_get_and_cache_all_types_dict 2 ms 1 ms +94.98%
test_successful_run_with_input_type_any 257.4 ms 310.1 ms -17%
test_successful_run_with_input_type_text 248.9 ms 167 ms +49.03%

codspeed-hq[bot] avatar Jan 16 '25 18:01 codspeed-hq[bot]

Close in lieu of individual PRs

pedrocassalpacheco avatar Jan 16 '25 19:01 pedrocassalpacheco