gbfs-validator Scrapping from python results of GBFS-validator

Scrapping from python results of GBFS-validator

Open iaguerri opened this issue 1 year ago • 1 comments

If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

I'm working in a MaaS application. I need to validate the GBFS that the public operators gives to me.

What is the issue and why is it an issue?

I'm trying to do a request from python to the result of a validation (https://gbfs-validator.mobilitydata.org/validator?url=https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json) I'm trying from POSTMAN

The problem is that the response is a 200 (OK) but the info is not possible to extract (even with scrapping) because the body says "We're sorry but my-project doesn't work properly without Javascript enabled. Please enable to continue"

The code used:

import requests
from bs4 import BeautifulSoup
 
url_validator = "[https://gbfs-validator.mobilitydata.org/validator"](https://gbfs-validator.mobilitydata.org/validator%22)
 
# Jsons de prueba
json_main_full_brusels = "[https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json"](https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json%22)                                               # Json Correcto
json_main_nolastupdated_brusels = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json%22)                 # Json Incorrecto (No last Updated)
json_main_vehiclyType_nolastupdated = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json%22)      # Json Incorrecto - feed VehicleTypes sin lastUpdated
json_main_nofeed_systeminformation = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json%22)    # Json Incorrecto - No feed SystemInformation
 
params = {
    "url": json_main_nolastupdated_brusels
}
 
url_completa = requests.Request('GET', url_validator, params=params).prepare().url
print("URL de la solicitud:", url_completa)
 

#APPROACH 1: access from the request
respuesta = requests.get(url_validator, params=params)

if respuesta.status_code == 200:
     datos_respuesta = respuesta.text
     print("Respuesta del Validador:", datos_respuesta)
else:
     print("Error en la solicitud. Código de estado:", respuesta.status_code)
     print("Contenido de la respuesta:", respuesta.text)`


#APPROACH 2: with selenium
soup = BeautifulSoup(respuesta.content, 'html.parser')
 
for div_element in soup.find_all('div', class_='data-v-7c2075bd'):
    # Extract the text content of the div element
    div_text = div_element.get_text(strip=True)
   
    # Print the value of k
    print("Valor de k es:", div_text)

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

I don't know why the html is not loaded after, but maybe activating Javascript it would be nicer to get this info

Thanks!!

Jan 04 '24 10:01 iaguerri

Hi @iaguerri, the GBFS Validator is currently deployed on Netlify. Looking at the error message you are getting, Netlify is detecting and blocking the use of a bot consumer. You can browse the Internet for solutions on how to avoid user-agent detection. However, I suggest using the "not documented/no stable" API endpoint if you want to get the validation report response for specific feeds. Unfortunately, we are not offering a stable API endpoint yet. The following issue contains information on how to access the API https://github.com/MobilityData/gbfs-validator/issues/95. If you would like to follow the development of the stable API, follow this issue https://github.com/MobilityData/gbfs-validator/issues/129.

Jan 04 '24 14:01 davidgamez

gbfs-validator gbfs-validator copied to clipboard

Scrapping from python results of GBFS-validator

If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

What is the issue and why is it an issue?

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

gbfs-validator
gbfs-validator copied to clipboard