biblio.el
biblio.el copied to clipboard
Backend for html only search engines
Looking at the API seems like it was meant for sites that return xml or json... is there an example of working maybe with css selectors directly on the html returned to a query when that's the only option?
Looking at the API seems like it was meant for sites that return xml or json
Not really; basically each backend is responsible for extracting the data and returning it in structured form.
is there an example of working maybe with css selectors directly on the html returned to a query when that's the only option?
There was https://github.com/cpitclaudel/biblio.el/pull/25/files , but it uses regexp, I think. You'd want to use libxml + some query selector engine (maybe https://github.com/zweifisch/enlive?) or direct recursion. I can help if you have a concrete example.
Hi. Thanks for the reply.
I'm looking at Israel's "Union List" (National Library), which seems to be a hosted Exlibris Primo site (I'm guessing). It's a convoluted and rather slow Angular based site, it seems. No API as far as I can tell.
Here's a sample query (in English):
http://merhav.nli.org.il/primo-explore/search?query=any,contains,postcolonial&tab=default_tab&search_scope=ULI&vid=ULI&lang=en_US&offset=0&fromRedirectFilter=true
Fiddling around I also found this "bare" query form: http://merhav.nli.org.il/primo_library/libweb/webservices/rest/primo-explore/v1/search.do?mode=Advanced&ct=AdvancedSearch
Would the Google Scholar example be easy to adapt in this case?
The API seems to be at 'http://merhav.nli.org.il/primo_library/libweb/webservices/rest/primo-explore/v1/pnxs
, but it requires a cookie apparently.
Would the Google Scholar example be easy to adapt in this case?
I don't think so. This seems to be a dynamic website, s parsing the HTML won't give you anything, since it doesn't contain results. However, it should be possible to get the JSON returned by the API and used by the website. I would recommend writing to the website's authors at this point.
I was asking around. They had an hackathon few years back to test out an iiif based api but it seems it didn't go anywhere.
https://github.com/OriHoch/hackathon-tasks/issues/1
I see. I think you can ask about the current API though: clearly the website is a JavaScript program that downloads JSON data; you should be able to download that same JSON data from ELisp; you just need to figure out the exact query and headers, and they should be able to help with that, I think.