ckanapi
ckanapi copied to clipboard
Problems with package_create in api 2.5.2
I have a script which seeds CKAN instances with groups, organizations, datasets and resources After creating resources it downloads the resource dictionary and then uploads a file (I coudl not find a way to do it in one step). My script works perfectly when run against CKAN 2.4.0 but is failing when I run it against 2.5.2. Was there an API change between versions that effects ckanapi?
In the newer version it appears that the package is not being returned as a dictionary in this line: dataset = ckan.action.package_create
When I attempt to look at dataset with pprint it returns -1.
Otherwise it fails with this error:
Traceback (most recent call last):
File "C:/Users/akoebrick/PycharmProjects/NG911/seed.py", line 160, in
Here is my relevant code:
# Now add the packages
##############################
for package in datasets:
packageTitle = package['title'] + " - " + countyPlain
packageName = munge(package['title'])
#Add county name to packageName to make it unique
packageName = packageName + "-" + countyName
#Add the datasets
try:
dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
except:
e = sys.exc_info()[0]
print "Error: %s : " %e
exc_type, exc_value, exc_traceback = sys.exc_info()
lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
print ''.join('!! ' + line for line in lines) # Log it or whatever here
#Update the resource so it is a file upload rather than a url. Does not seem to be possible in initial loop
resourceId = dataset['resources'][0]['id']
fileName = dataset['resources'][0]['name']
fileName = 'data/' + fileName
resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))
The code you've posted is suspicious. Are you really using a bare "except" and letting it fall through to the code that updates the resource?
Yes, yes I am. This is just a utility script run on my desktop which will be disposed of when the ckan instance is seeded. I did not realize the a bare except was bad... I was just trying to get a full trace for debugging. Python is not my primary language.
Anyhow, any idea as to why the ckanapi package_create is behaving differently between versions? Is the -1 that pprint dumps from the ckanapi or is it some general python error when a dict is empty / malformed? I have been trying to find any mention of -1 as a return with no luck.
Thanks for the assistance. So far the ckanapi has been a great help. Much cleaner/faster than manually creating requests.
Also, I just read up a bit on error handling and rewrote it thusly:
for package in datasets: packageTitle = package['title'] + " - " + countyPlain packageName = munge(package['title'])
#Add county name to packageName to make it unique
packageName = packageName + "-" + countyName
#Add the datasets
dataset = []
try:
dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
print (e)
#Update the resource so it is a file upload rather than a url. Does not seem to be possible in initial loop
resourceId = dataset['resources'][0]['id']
fileName = dataset['resources'][0]['name']
fileName = 'data/' + fileName
resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))
Still no luck accessing the returned dataset object.
The -1 might be related to an old performance hack. try passing the value returned to dict() and see if it looks like the right value. You're not showing me enough code to figure out what's going on. Are you using LocalCKAN or RemoteCKAN?
Another problem with your code is that after you get an error you will still try to update the resource. You can put your resource-updating code in an else: block after the except to prevent this.
dict(dataset) only creates a new object and discards it. I meant you could use print dict(dataset) if you're seeing a -1 where you think you shouldn't be.
What is the full traceback you're seeing?
Instead of writing this script have you looked at the ckanapi load datasets --upload-resources command? You could write your datasets as a JSON lines file and pass it to that command to create and upload the datasets for you.
Here is the full script- I have tried posting a few times to get the code to format correctly but no luck... Basic gist is that it gets a lsit of Minnesota counties and then iterates over them and creates packages. Then it attempts to upload datasets (which is the part that is failing in CKAN 2.5.2
Sorry about the layout.
################### import urllib, json, requests, sys, pprint, pyasn1, traceback from xml.etree import ElementTree from ckanapi import RemoteCKAN, NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError
prefix = 'test-8-' #maxCount = '1' #utility function to make valid names def munge(string): result = string.replace(" ","-") result = result.lower() return result
#Create the list which will hold the dictionaries datasets = [] datasets.append ({'title':'Street Centerlines', 'notes': 'Represents the estimated centerline of a real world roadway; includes official addresses and street names that are approved by local addressing authorities. ', 'group' : [{'name' : 'street-centerline'}], 'resources' : [{ 'name' : 'rcl.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of street centerlines', 'url' : 'dummy-value', #'upload' : 'open("/NG911_data/rcl.zip", "rb")', }] }) datasets.append ({'title':'PSAP Boundaries', 'notes': 'Defines the geographic area of a PSAP that has primary responsibilities for an emergency request. A geographic location can only have one designated primary PSAP. This layer is used by the ECRF to perform a geographic query to determine to which PSAP an emergency request is routed. An emergency request is routed based upon the geographic location of the request, provided by layer. ', 'group' : [{'name' : 'psap-boundaries'}], 'resources' : [{ 'name' : 'psap.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of psap boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Parcel Boundaries', 'notes': 'Defines the geographic areas for property parcels. For NG9-1-1 parcels are used as a surrogate dataset for counties without address point data.', 'group' : [{'name' : 'parcel-boundaries'}], 'resources' : [{ 'name' : 'parcel.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of law parcel boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Law Enforcement Boundaries', 'notes': 'Defines the geographic area for the primary providers of Law Enforcement services. This layer is used by the ECRF to perform a geographic query to determine which Emergency Service Providers are responsible for providing service to a location. In addition, Emergency Service Boundaries are used by PSAPs to identify the appropriate entities/first responders to dispatch.', 'group' : [{'name' : 'law-enforcement-bounaries'}], 'resources' : [{ 'name' : 'law.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of law enforcement boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Fire Service Boundaries', 'notes': 'Defines the geographic area for the primary providers of Fire services. This layer is used by the ECRF to perform a geographic query to determine which Emergency Service Providers are responsible for providing service to a location. In addition, Emergency Service Boundaries are used by PSAPs to identify the appropriate entities/first responders to dispatch.', 'group' : [{'name' : 'fire-srvice-boundaries'}], 'resources' : [{ 'name' : 'fire.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of fire service boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'EMS Boundaries', 'notes': 'Defines the geographic area for the primary providers of Emergency Medical services. This layer is used by the ECRF to perform a geographic query to determine which Emergency Service Providers are responsible for providing service to a location. In addition, Emergency Service Boundaries are used by PSAPs to identify the appropriate entities/first responders to dispatch.', 'group' : [{'name' : 'ems-boundaries'}], 'resources' : [{ 'name' : 'ems.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of EMS boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Address Points', 'notes': 'Represent the location of the site or a structure or the location of access to a site or structure. Site/structure points can also represent landmarks.', 'group' : [{'name' : 'address-points'}], 'resources' : [{ 'name' : 'adp.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of address points', 'url' : 'http://resource.not.yet.added.org' }] }) #pprint.pprint(datasets)
#dev #key= "xxxxxxxxxxxxxxx" #prd key= "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx" ua = 'seedingScript/1.0 (+https://gisdata.mn.gov)' #dev #ckan = RemoteCKAN('https://devel-ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)
#prd ckan = RemoteCKAN('https://ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)
#Get county names from CTU database api, which serves out in XML url = "http://www.mngeo.state.mn.us/CTU/query.pl?select=feature_name,ctu_type&where=ctu_type%20=%20%27County%27" response = requests.get(url) tree = ElementTree.fromstring(response.content)
#count is just to stop loop early while testing count = 0 #prefix is for testing since organizaitons are hard to delete. This allows us to create "throw away" orga.
#Do the actual loop of organizations for county in tree.iter('result'): #break for testing. No need to do all 87 counites.
if count == 1:
break
count = count+1
county = county[0].text
countyPlain = county
county = prefix + county
#Create organization name
countyName = munge(county)
print "Creating organization for: "+ county
#print "Name: " + countyName
try:
org = ckan.action.organization_create(name=countyName, title=county)
except:
e = sys.exc_info()[0]
print "Error: %s : " %e
exc_type, exc_value, exc_traceback = sys.exc_info()
lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
print ''.join('!! ' + line for line in lines) # Log it or whatever here
#pprint.pprint(org)
orgId = org['id']
##############################
# Now add the packages
##############################
for package in datasets:
packageTitle = package['title'] + " - " + countyPlain
packageName = munge(package['title'])
#Add county name to packageName to make it unique
packageName = packageName + "-" + countyName
#Add the datasets
dataset = []
try:
dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
print (e)
print (e.args)
dict(dataset)
#Update the resource so it is a file upload rather than a url. Does not seem to be possible in initial loop
resourceId = dataset['resources'][0]['id']
fileName = dataset['resources'][0]['name']
fileName = 'data/' + fileName
resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))
dataset does not seem to be a dict. When I try: print dict(dataset)
I get: TypeError: 'int' object is not iterable
I did try the File Uploads as part of resource_create but it failed to create a file. This was the example I was using: mysite.action.resource_create( package_id='my-dataset-with-files', url='dummy-value', # ignored but required by CKAN<=2.5.x upload=open('/path/to/file/to/upload.csv', 'rb'))
I am going to write a stripped down script that does not do any looping- just creates a single package / resource / upload. Perhaps that will be easier to debug than the code-salad I posted to github yesterday.
From: Ian Ward [mailto:[email protected]] Sent: Monday, August 15, 2016 3:43 PM To: ckan/ckanapi [email protected] Cc: Koebrick, Andrew (MNIT) [email protected]; Author [email protected] Subject: Re: [ckan/ckanapi] Problems with package_create in api 2.5.2 (#94)
dict(dataset) only creates a new object and discards it. I meant you could use print dict(dataset) if you're seeing a -1 where you think you shouldn't be.
What is the full traceback you're seeing?
Instead of writing this script have you looked at the ckanapi load datasets --upload-resources command? You could write your datasets as a JSON lines file and pass it to that command to create and upload the datasets for you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ckan/ckanapi/issues/94#issuecomment-239923076, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AElLWMLpfZt3QqgkiFAHHoCe1_bThPjJks5qgM9qgaJpZM4JkpIH.
To make the code a little clearer, I created a cleaner test script that gets rid of most of the looping and hard codes some data, rather then get it from an external service. It better illustrates how ckanapi is behaving differently when run against CKAN 2.5.2 and 2.4.0:
import sys, pprint, traceback
from ckanapi import RemoteCKAN, NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError
ua = 'seedingScript/1.0 (+https://gisdata.mn.gov)'
def munge(string):
result = string.replace(" ","-")
result = result.lower()
return result
#Create the list which will hold the dictionaries
datasets = []
datasets.append ({'title':'Test title',
'notes': 'Test description. ',
'group' : [{'name' : 'street-centerline'}],
'resources' : [{
'name' : 'rcl.zip',
'format' : 'application/zip',
'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of street centerlines',
'url' : 'dummy-value',
#'upload' : 'open("/NG911_data/rcl.zip", "rb")',
}]
})
#dev: this works, it is a CKAN 2.4.0 server
key= "XXXXXXXXXXXXXXX"
ckan = RemoteCKAN('https://devel-ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)
#prd: this fails, it is a CKAN 2.5.2 server
#key= "XXXXXXXXXXXXXXXXX"
#ckan = RemoteCKAN('https://ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)
county='Test County'
#Create organization name
countyName = munge(county)
try:
org = ckan.action.organization_create(name=countyName, title=county)
except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
print (e)
print (e.args)
orgId = org['id']
# add the packages
for package in datasets:
packageTitle = package['title'] + " - " + county
packageName = munge(package['title'])
# add county name to packageName to make it unique
packageName = packageName + "-" + countyName
# add the datasets
dataset = []
try:
dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
print (e)
print (e.args)
#dataset is not a dictionary in CKAN api 2.5.2, rather it appears to be an int with the value of -1.
# however it is a dictionary in CKAN api 2.4.0
print dict(dataset)
# update the resource so it is a file upload rather than a url. Does not seem to be possible in initial loop
# This is the section that is failing since dataset is not a dict.
resourceId = dataset['resources'][0]['id']
fileName = dataset['resources'][0]['name']
fileName = 'data/' + fileName
resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))
`
I've just tried your script against ckan release-v2.5.2 commit id 919201e886fa95c8f26b9d970cb9ef7bbb916ca9 and it's working for me. What's the exact version of ckan you're running? Your script creates the dataset, uploads the file and prints out the dataset created.
Also, which version of ckanapi?
Ckanapi 3.6 (as installed by pip) Ckan-2.5.2, the tagged version (https://github.com/ckan/ckan/releases/tag/ckan-2.5.2). Not sure what commit number this gets…
I went ahead and seeded my portal without updating the resources to have .zip files attached and will just have the local government units override my dummy URL with the .zip files latter. Or I can try to programmatically update it latter if the api gets sorted out. But the time pressure is off. I was rushing against a deadline of noon today, since we have team members in the field training staff at some of our northern counties which border your fine country.
If you want to try to do any testing on our actual server, the URL is: https://ng911.gisdata.mn.gov/ We will be shutting firewalls around in as real data comes in, but for now it is open. I would be happy to sysadmin you. Unfortunately our dev server which will match the prd version-for-version is not yet provisioned (hence making sure I use “test” prefixes when creating organizations).
It would not surprise me at all if there is some simple failure at my end. Although I am just not seeing it.
Thanks for taking the time to look into this.
Andrew
From: Ian Ward [mailto:[email protected]] Sent: Tuesday, August 16, 2016 10:49 AM To: ckan/ckanapi [email protected] Cc: Koebrick, Andrew (MNIT) [email protected]; Author [email protected] Subject: Re: [ckan/ckanapi] Problems with package_create in api 2.5.2 (#94)
I've just tried your script against ckan release-v2.5.2 commit id 919201e886fa95c8f26b9d970cb9ef7bbb916ca9 and it's working for me. What's the exact version of ckan you're running? Your script creates the dataset, uploads the file and prints out the dataset created.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ckan/ckanapi/issues/94#issuecomment-240145228, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AElLWLNXkbuoWODrjyUt5oNwU4PbEgtoks5qgdvwgaJpZM4JkpIH.