eurlex
eurlex copied to clipboard
suggestions for improving elx_dowload_xml and make query
Hi Michal,
Thanks for releasing v0.4.0, I updated R and eurlex and i am using it. I recently used elx_dowload_xml and I wanted to suggest some improvements:
- line 28 should likely be :
notice type must be correctly specified" = notice %in% c("tree", "branch", "object"))
(this is more of an issue) -
file = basename(url)
could befile = paste(basename(url), ".xml)"
- With the current settings when
object
is passed to notice the object expression notice is retrieved (p 44 of cellar), however this does not contain metadata. I'd suggest to drop the language header and use?language=
a the end of the url whenobject
is passed (p 42 of cellar), so that the object notice with the object metadata is retrieved. - elx_dowload_xml could encapsulate a function that returns the xml notice as a string. So a user could decide wether to directly dowload the xml notice, or to get the xml notice as a string an parse it to get other fields and complement the make_query and run_query functions.
- About elx_make_query, you remember that there was the issue of the 10e6 limit? A workaraound/improvement could be to group together multiple items of the same property of a work. e.g. if i pass
include_authors = TRUE
, it could help to use(group_concat(distinct ?author_;separator=", ") as ?author)
in theselect
statement andOPTIONAL{?work cdm:work_created_by_agent ?author_.}
in thewhere
statement of the sparql query. The uri would still be inside, but i see this less of an issue to clean it afterwards. This would help in not having duplicated works when running queries.
What do you think about theese?
All the best
hi, thanks for these suggestions again, it's much appreciated that you are testing the features and giving feedback.
- fixed in 0.4.1
- fixed in 0.4.1
- fixed in 0.4.1
- I don't recommend doing this, as the notices are usually large and can be a drag on Eur-Lex servers, so it's better to first download the notice and then work with it. Having said that, you can now access the different notices in R through
elx_fetch_data()
. If you want the data as a string just wrap the output in as.character() - this sounds like a good suggestion but it might require a major rewrite of the make_query architecture which is admittedly pretty clumsy. I don't have time for this right now, but if you create a pull request I can have a look at it
You can get the provisional fixes for 1-4 by updating to the github version of the package. I haven't tested them properly yet so feedback would be welcome again