langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Basic html loader with crawly

Open warnero opened this issue 2 years ago • 5 comments

  • Added Document and DocumentLoader Behaviours
  • Added Crawly DocumentLoader

warnero avatar Oct 20 '23 22:10 warnero

Hey @brainlid I wanted to split up my work into smaller chunks so I can get it in (and others can play with the blocks/revamp/etc.). How does this one look?

warnero avatar Oct 24 '23 22:10 warnero

@brainlid I see this has been sitting for a while. I am planning on doing some data loading from api's soon, and was wondering if there are plans to integrate this PR or some sort of document in general?

matthusby avatar Aug 24 '24 12:08 matthusby

I think this effort has stalled out. I’m open to new work in this area. What do you need?

On Sat, Aug 24, 2024 at 6:54 AM Matt Husby @.***> wrote:

@brainlid https://github.com/brainlid I see this has been sitting for a while. I am planning on doing some data loading from api's soon, and was wondering if there are plans to integrate this PR or some sort of document in general?

— Reply to this email directly, view it on GitHub https://github.com/brainlid/langchain/pull/22#issuecomment-2308384632, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGFQGDZBYWTO5VAS7BPL3DZTB7ANAVCNFSM6AAAAABNBSKKS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGM4DINRTGI . You are receiving this because you were mentioned.Message ID: @.***>

brainlid avatar Aug 24 '24 15:08 brainlid

I am not doing anything too fancy, just planning to pull in some jira tickets and maybe github issues.

My main question is what do you think of using the Document model that is in this PR? I would like to stick to a standard way of doing the document loading etc, at first glance this seems fine - but wanted to make sure I wasn't missing something.

matthusby avatar Aug 24 '24 16:08 matthusby

what do you think of using the Document model that is in this PR

I think the Document model was incomplete. The idea was to base it on the TS/Python LangChain Document idea. I'm not using it personally nor do I have any short-term needs for it. However, I'm open to that approach.

brainlid avatar Aug 25 '24 12:08 brainlid