gbfs-validator
gbfs-validator copied to clipboard
Scrapping from python results of GBFS-validator
If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!
I'm working in a MaaS application. I need to validate the GBFS that the public operators gives to me.
What is the issue and why is it an issue?
I'm trying to do a request from python to the result of a validation (https://gbfs-validator.mobilitydata.org/validator?url=https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json) I'm trying from POSTMAN
The problem is that the response is a 200 (OK) but the info is not possible to extract (even with scrapping) because the body says "We're sorry but my-project doesn't work properly without Javascript enabled. Please enable to continue"
The code used:
import requests
from bs4 import BeautifulSoup
url_validator = "[https://gbfs-validator.mobilitydata.org/validator"](https://gbfs-validator.mobilitydata.org/validator%22)
# Jsons de prueba
json_main_full_brusels = "[https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json"](https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json%22) # Json Correcto
json_main_nolastupdated_brusels = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json%22) # Json Incorrecto (No last Updated)
json_main_vehiclyType_nolastupdated = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json%22) # Json Incorrecto - feed VehicleTypes sin lastUpdated
json_main_nofeed_systeminformation = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json%22) # Json Incorrecto - No feed SystemInformation
params = {
"url": json_main_nolastupdated_brusels
}
url_completa = requests.Request('GET', url_validator, params=params).prepare().url
print("URL de la solicitud:", url_completa)
#APPROACH 1: access from the request
respuesta = requests.get(url_validator, params=params)
if respuesta.status_code == 200:
datos_respuesta = respuesta.text
print("Respuesta del Validador:", datos_respuesta)
else:
print("Error en la solicitud. Código de estado:", respuesta.status_code)
print("Contenido de la respuesta:", respuesta.text)`
#APPROACH 2: with selenium
soup = BeautifulSoup(respuesta.content, 'html.parser')
for div_element in soup.find_all('div', class_='data-v-7c2075bd'):
# Extract the text content of the div element
div_text = div_element.get_text(strip=True)
# Print the value of k
print("Valor de k es:", div_text)
Please describe some potential solutions you have considered (even if they aren’t related to GBFS).
I don't know why the html is not loaded after, but maybe activating Javascript it would be nicer to get this info
Thanks!!
Hi @iaguerri, the GBFS Validator is currently deployed on Netlify. Looking at the error message you are getting, Netlify is detecting and blocking the use of a bot consumer. You can browse the Internet for solutions on how to avoid user-agent detection. However, I suggest using the "not documented/no stable" API endpoint if you want to get the validation report response for specific feeds. Unfortunately, we are not offering a stable API endpoint yet. The following issue contains information on how to access the API https://github.com/MobilityData/gbfs-validator/issues/95. If you would like to follow the development of the stable API, follow this issue https://github.com/MobilityData/gbfs-validator/issues/129.