dataiku-contrib
dataiku-contrib copied to clipboard
geocoding_ban/ : French Geocoding with BAN
Ban is a French geocoding api http://adresse.data.gouv.fr/
J'ai l'impression que tu as commité 2 plugins sur cette PR @Tristramg Sinon bravo ça semble hyper utile.
Oups, oui, effectivement, merci ! C’set corrigé !

Hi,
Thanks for this contribution! Ever since BAN / BANO appeared, I had wanted to add this kind of features to DSS.
First of all, do you confirm that you wish us to publish this plugin?
A few things that I noticed:
- Since the plugin dumps the data as returned by Addok, the "latitude" and "longitude" columns are not prefixed by the user-specified prefix, which might be a bit confusing
- If the first "sampling" query fails (API error for example), only the "result_score" column will be written to the output dataset schema. Further queries that succeed will only have their "latitude" column written out in the result_score (since it's the first returned column). Probably we could just make the whole recipe fail if the sampling query fails, so as to ensure that we have a proper schema
- It would make debugging easier to add a "result_error" (or something like that) column with failure details when result_score = -1 (HTTP code and response text)
I'll open a review for minor nitpicks, thanks again!
Thank you for the review.
Yes, we would like to publish this plugin. I also asked the people behind Addok and BANO and they are happy to know that there is a plugin.
- I fixed the missing prefix for the coordinates
- The is an optional column for the error code
- I included your suggestions
Now a few questions:
- What would be the best way to abort the job (e.g. for the sampling query)
- How should I log? To give info and warning during the process
- Could it be made a processor instead of a recipe? I found no documentation how to do it
- How could the plugin appear as in the right-hand menu when selecting a database?
Thanks!
- To abort the job, simply raise a Python exception (with a clear error message), this will cause the job to fail
- At the moment, we don't have the ability for plugins to write warnings in the "Warnings" tab, this is something which we plan to add. In the meantime, use the "logging" package and use
logging.info/logging.warning(no need to configure it, the framework does it) - Very slow processes like this one (or any process which calls to an external API) does not work really well as a processor, because processors are called very often, so it's better as a recipe. We plan to do in the coming months a set of higher level APIs for custom recipes that only do "1 for 1" enrichment of lines, like most API-calling recipes. This will make it much easier to write this kind of plugins, handling everything related to errors, parallelism, batching, writing the output, ...
- To make the recipe creatable from a dataset's "Actions" menu, add the following to your recipe.json file
"selectableFromDataset" : "input",. This will make it appear in the menu, and the context dataset will be pre-added to the "input" role.
Thank you. I hope I applied all your suggestions, and that I didn’t add some rubbish while doing so.
:up: ;)