AGiXT icon indicating copy to clipboard operation
AGiXT copied to clipboard

Requirements.txt Updates

Open Josh-XT opened this issue 1 year ago • 2 comments

Problem Description

We're being negatively impacted by module versions being forced by other modules we're using and by other modules introducing breaking changes with function/class name changes.

Proposed Solution

We need to update our requirements.txt to force known working versions with our software.

Alternatives Considered

Yelling at everyone else doing this or making breaking changes to their modules. Seems like a waste of time though.

Additional Context

  • gpt4free forcing the Streamlit version has broken things for some people.
  • DuckDuckGo search updating their module had breaking changes with all functions and class names changed which required code changes in order to restore functionality.

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this feature has not been requested yet.
  • [X] I have provided enough information for the maintainers to understand and evaluate this request.

Josh-XT avatar May 25 '23 13:05 Josh-XT

Chill man, you're not in production!

mongolu avatar May 25 '23 14:05 mongolu

Chill man, you're not in production!

It unfortunately doesn't make maintaining any easier when things I tested yesterday worked but today are broken haha.

Josh-XT avatar May 25 '23 14:05 Josh-XT

Chill man, you're not in production!

Partly, it's me. I want to run it, and it randomly breaks bc of dependencies. Not because of non-working code. This takes a lot of time - each single time to find out if the error is in source or not. Just trying to find a solution. Happy for guidance :)

localagi avatar May 25 '23 18:05 localagi

Maybe search function and read text shouldn't use some web pages api? Maybe is some framework or ready application which will be digest data from web pages without special apis? That will give you more Independence from api providers .

mirek190 avatar May 25 '23 21:05 mirek190

Maybe search function and read text shouldn't use some web pages api? Maybe is some framework or ready application which will be digest data from web pages without special apis? That will give you more Independence from api providers .

You kind of have to pick your battles when building stuff like this. There is always the possibility that I could write any of the modules that I'm using better than the people who wrote them, but there isn't the possibility of me having the time to do so. Happy to take suggestions of other modules to use. We have several search modules available to use, DuckDuckGo is just the default for the sake of privacy and it being free without requiring an API key. You can also use Google's official API or Searx, but they're just additional setup for API keys that no one wants to really do.

Josh-XT avatar May 25 '23 21:05 Josh-XT

I was thinking something like this one ;)

https://stackoverflow.com/questions/1141136/how-can-i-programmatically-perform-a-search-without-using-an-api

To be independent from API

mirek190 avatar May 25 '23 22:05 mirek190

Lots has changed on the internet since 2009, most websites have safeguards against you doing these things now.

Josh-XT avatar May 25 '23 22:05 Josh-XT

Without API

With Selenium, you can write code to control a web browser and interact with web pages. 
This allows you to search for text on a web page, or to fill out forms and submit them.

Another way to programmatically perform a search is to use a screen scraping tool. 
Screen scraping tools allow you to extract data from a web page without interacting with the browser. 
This can be useful if you need to extract data from a web page that does not have an API.

Finally, you can also use a regular expression to search for text on a web page. 
Regular expressions are a powerful tool for searching for text that matches a specific pattern.
--------------------------------------------------
Using Selenium:


from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.google.com")

# Search for "home"
search_box = driver.find_element_by_id("q")
search_box.send_keys("home")

# Click the search button
search_button = driver.find_element_by_id("btnK")
search_button.click()

# Get the search results
results = driver.find_elements_by_class_name("g")

# Print the search results
for result in results:
    print(result.text)

--------------------------------------------------
Using a screen scraping tool:


Using a screen scraping tool:
Code snippet
import requests
from bs4 import BeautifulSoup

url = "https://www.google.com"

# Make a request to the web page
response = requests.get(url)

# Parse the response as HTML
soup = BeautifulSoup(response.content, "html.parser")

# Find the search results
results = soup.find_all("div", class_="g")

# Print the search results
for result in results:
    print(result.text)

--------------------------------------------------
Using a regular expression:

import re

text = """
This is a text with the word "home" in it.
"""

# Find the word "Bard" in the text
match = re.search(r"home", text)

# If the match is found, print the word
if match:
    print(match.group())

--------------------------------------------------
Using Puppeteer to search DuckDuckGo:

Code snippet
import puppeteer

browser = puppeteer.launch()
page = browser.new_page()

page.goto('https://duckduckgo.com')

search_box = page.querySelector('#q')
search_box.type('how to programmatically perform a search')

search_button = page.querySelector('#search_button')
search_button.click()

results = page.evaluate('document.querySelectorAll(".result__snippet")')

for result in results:
    print(result.textContent)

--------------------------------------------------
Using BeautifulSoup to search Stack Overflow:

Code snippet
import requests
from bs4 import BeautifulSoup

url = 'https://stackoverflow.com/search?q=how+to+programmatically+perform+a+search'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

results = soup.find_all('div', class_='result-card')

for result in results:
    title = result.find('a', class_='result-link').text
    link = result.find('a', class_='result-link')['href']
    print(title, link)

--------------------------------------------------

I found at least 5 tools for it.
Maybe will be helpful ... 



mirek190 avatar May 25 '23 23:05 mirek190