instill-core icon indicating copy to clipboard operation
instill-core copied to clipboard

[Web] Setup header for web scrape task

Open ShihChun-H opened this issue 1 year ago • 14 comments
trafficstars

Issue Description

Current State

  • Web crawler cannot set header auth

Why We Want to Change?

  • With header auth, the users can crawl the web requires auth.
  • It gives VDP more use cases.

Proposed Change

  • Setup input for all tasks in Web operator "headers": {}
  • Users can set auth and content type

Rules for the Component Hackathon

  • Each issue will only be assigned to one person/team at a time.
  • You can only work on one issue at a time.
  • To express interest in an issue, please comment on it and tag @kuroxx, allowing the Instill AI team to assign it to you.
  • Ensure you address all feedback and suggestions provided by the Instill AI team.
  • If no commits are made within five days, the issue may be reassigned to another contributor.
  • Join our Discord to engage in discussions and seek assistance in #hackathon channel. For technical queries, you can tag @chuang8511.

Component Contribution Guideline | Documentation | Official Go Tutorial

ShihChun-H avatar Sep 24 '24 10:09 ShihChun-H

Hi @ShihChun-H can you please assign this issue to me?

someshfengde avatar Oct 01 '24 08:10 someshfengde

Hi @someshfengde , sure. The issue has been assigned to you.

ShihChun-H avatar Oct 01 '24 09:10 ShihChun-H

thank you will start

someshfengde avatar Oct 01 '24 10:10 someshfengde

Hi @ShihChun-H can you please help me get started for working on this issue. I've been trying to set up my machine according to contributions.md but it's not working out (it's been 50 + mins since pulling images from docker)

Also can you explain in more detail what I've to do?

from description mentioned I think I have to add headers to schema/ai-tasks.json lmk if I'm on right path.

I've been thinking to add this

    "headers": {
      "title": "Request Headers",
      "description": "HTTP headers to include in the request.",
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    }

Thanks :)

someshfengde avatar Oct 01 '24 14:10 someshfengde

Hi @someshfengde , Thanks for taking time on this.

can you please help me get started for working on this issue. I've been trying to set up my machine according to contributions.md but it's not working out (it's been 50 + mins since pulling images from docker)

It could be the several reasons. From my experience, you may need to increase your docker resources. In the Docker Desktop, you can find them here. Could you please try it out again? image

Or, sometimes restarting your PC / cleaning your docker resources could help as well.

Also can you explain in more detail what I've to do?

You have to add more params in web operator's tasks.json. To scrape some websites requiring more information, scraper needs to set up the header to access the website. So, you can add more optional params in scrapers. And, the users can set up some tokens or key when they scrape specific sites.

I hope I answer your all questions. Please feel free to ask me anything if there is further question! Thank you again!

chuang8511 avatar Oct 03 '24 10:10 chuang8511

Hi @someshfengde, I'm following up to check on any progress made or any question encountered regarding this issue. Could you please provide an update? Thanks 🙏

ShihChun-H avatar Oct 08 '24 08:10 ShihChun-H

sorry I have been busy for last couple of days. Will continue to work on it after some hours

someshfengde avatar Oct 08 '24 08:10 someshfengde

Hey @someshfengde how's it going? If you have any PR for this, don't forget to submit it!

kuroxx avatar Oct 15 '24 10:10 kuroxx

sorry got into other tasks I think it'll require me lots of efforts can you please assign this to someone else?

someshfengde avatar Oct 15 '24 10:10 someshfengde

@someshfengde No worries - thank you for letting me know! Good luck with your other tasks

kuroxx avatar Oct 15 '24 10:10 kuroxx

hii @kuroxx , if you dont mind can i look into this issue?

Sourabh782 avatar Oct 17 '24 04:10 Sourabh782

Hey @Sourabh782, sounds great! I have assigned it to you 🤝

kuroxx avatar Oct 17 '24 08:10 kuroxx

Hey @Sourabh782 how's it going? Any blockers or progress?

If you have questions or need help, we have Discord community here: https://discord.gg/sevxWsqpGh

kuroxx avatar Oct 25 '24 09:10 kuroxx

Hey @Sourabh782, not sure if you're still working on this but since it's been 2 weeks now - I will unassign this task. Thanks

kuroxx avatar Oct 29 '24 09:10 kuroxx