ckanapi icon indicating copy to clipboard operation
ckanapi copied to clipboard

Problems with package_create in api 2.5.2

Open exlibris opened this issue 9 years ago • 12 comments

I have a script which seeds CKAN instances with groups, organizations, datasets and resources After creating resources it downloads the resource dictionary and then uploads a file (I coudl not find a way to do it in one step). My script works perfectly when run against CKAN 2.4.0 but is failing when I run it against 2.5.2. Was there an API change between versions that effects ckanapi?

In the newer version it appears that the package is not being returned as a dictionary in this line: dataset = ckan.action.package_create

When I attempt to look at dataset with pprint it returns -1.

Otherwise it fails with this error: Traceback (most recent call last): File "C:/Users/akoebrick/PycharmProjects/NG911/seed.py", line 160, in resourceId = dataset['resources'][0]['id'] TypeError: 'int' object has no attribute 'getitem'

Here is my relevant code:

# Now add the packages
##############################
for package in datasets:
    packageTitle = package['title'] + " - " + countyPlain
    packageName = munge(package['title'])

    #Add county name to packageName to make it unique
    packageName = packageName + "-" + countyName
    #Add the datasets
    try:
        dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
    except:
        e = sys.exc_info()[0]
        print "Error: %s : " %e
        exc_type, exc_value, exc_traceback = sys.exc_info()
        lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
        print ''.join('!! ' + line for line in lines)  # Log it or whatever here


    #Update the resource so it is a file upload rather than a url.  Does not seem to be possible in initial loop
    resourceId = dataset['resources'][0]['id']
    fileName = dataset['resources'][0]['name']
    fileName = 'data/' + fileName
    resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))

exlibris avatar Aug 15 '16 18:08 exlibris

The code you've posted is suspicious. Are you really using a bare "except" and letting it fall through to the code that updates the resource?

wardi avatar Aug 15 '16 19:08 wardi

Yes, yes I am. This is just a utility script run on my desktop which will be disposed of when the ckan instance is seeded. I did not realize the a bare except was bad... I was just trying to get a full trace for debugging. Python is not my primary language.

Anyhow, any idea as to why the ckanapi package_create is behaving differently between versions? Is the -1 that pprint dumps from the ckanapi or is it some general python error when a dict is empty / malformed? I have been trying to find any mention of -1 as a return with no luck.

Thanks for the assistance. So far the ckanapi has been a great help. Much cleaner/faster than manually creating requests.

exlibris avatar Aug 15 '16 20:08 exlibris

Also, I just read up a bit on error handling and rewrote it thusly:

for package in datasets: packageTitle = package['title'] + " - " + countyPlain packageName = munge(package['title'])

    #Add county name to packageName to make it unique
    packageName = packageName + "-" + countyName
    #Add the datasets
    dataset = []
    try:
        dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
    except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
        print (e)



    #Update the resource so it is a file upload rather than a url.  Does not seem to be possible in initial loop
    resourceId = dataset['resources'][0]['id']
    fileName = dataset['resources'][0]['name']
    fileName = 'data/' + fileName
    resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))

Still no luck accessing the returned dataset object.

exlibris avatar Aug 15 '16 20:08 exlibris

The -1 might be related to an old performance hack. try passing the value returned to dict() and see if it looks like the right value. You're not showing me enough code to figure out what's going on. Are you using LocalCKAN or RemoteCKAN?

Another problem with your code is that after you get an error you will still try to update the resource. You can put your resource-updating code in an else: block after the except to prevent this.

wardi avatar Aug 15 '16 20:08 wardi

dict(dataset) only creates a new object and discards it. I meant you could use print dict(dataset) if you're seeing a -1 where you think you shouldn't be.

What is the full traceback you're seeing?

Instead of writing this script have you looked at the ckanapi load datasets --upload-resources command? You could write your datasets as a JSON lines file and pass it to that command to create and upload the datasets for you.

wardi avatar Aug 15 '16 20:08 wardi

Here is the full script- I have tried posting a few times to get the code to format correctly but no luck... Basic gist is that it gets a lsit of Minnesota counties and then iterates over them and creates packages. Then it attempts to upload datasets (which is the part that is failing in CKAN 2.5.2

Sorry about the layout.

################### import urllib, json, requests, sys, pprint, pyasn1, traceback from xml.etree import ElementTree from ckanapi import RemoteCKAN, NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError

prefix = 'test-8-' #maxCount = '1' #utility function to make valid names def munge(string): result = string.replace(" ","-") result = result.lower() return result

#Create the list which will hold the dictionaries datasets = [] datasets.append ({'title':'Street Centerlines', 'notes': 'Represents the estimated centerline of a real world roadway; includes official addresses and street names that are approved by local addressing authorities. ', 'group' : [{'name' : 'street-centerline'}], 'resources' : [{ 'name' : 'rcl.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of street centerlines', 'url' : 'dummy-value', #'upload' : 'open("/NG911_data/rcl.zip", "rb")', }] }) datasets.append ({'title':'PSAP Boundaries', 'notes': 'Defines the geographic area of a PSAP that has primary responsibilities for an emergency request. A geographic location can only have one designated primary PSAP. This layer is used by the ECRF to perform a geographic query to determine to which PSAP an emergency request is routed. An emergency request is routed based upon the geographic location of the request, provided by layer. ', 'group' : [{'name' : 'psap-boundaries'}], 'resources' : [{ 'name' : 'psap.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of psap boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Parcel Boundaries', 'notes': 'Defines the geographic areas for property parcels. For NG9-1-1 parcels are used as a surrogate dataset for counties without address point data.', 'group' : [{'name' : 'parcel-boundaries'}], 'resources' : [{ 'name' : 'parcel.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of law parcel boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Law Enforcement Boundaries', 'notes': 'Defines the geographic area for the primary providers of Law Enforcement services. This layer is used by the ECRF to perform a geographic query to determine which Emergency Service Providers are responsible for providing service to a location. In addition, Emergency Service Boundaries are used by PSAPs to identify the appropriate entities/first responders to dispatch.', 'group' : [{'name' : 'law-enforcement-bounaries'}], 'resources' : [{ 'name' : 'law.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of law enforcement boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Fire Service Boundaries', 'notes': 'Defines the geographic area for the primary providers of Fire services. This layer is used by the ECRF to perform a geographic query to determine which Emergency Service Providers are responsible for providing service to a location. In addition, Emergency Service Boundaries are used by PSAPs to identify the appropriate entities/first responders to dispatch.', 'group' : [{'name' : 'fire-srvice-boundaries'}], 'resources' : [{ 'name' : 'fire.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of fire service boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'EMS Boundaries', 'notes': 'Defines the geographic area for the primary providers of Emergency Medical services. This layer is used by the ECRF to perform a geographic query to determine which Emergency Service Providers are responsible for providing service to a location. In addition, Emergency Service Boundaries are used by PSAPs to identify the appropriate entities/first responders to dispatch.', 'group' : [{'name' : 'ems-boundaries'}], 'resources' : [{ 'name' : 'ems.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of EMS boundaries', 'url' : 'http://resource.not.yet.added.org' }] }) datasets.append ({'title':'Address Points', 'notes': 'Represent the location of the site or a structure or the location of access to a site or structure. Site/structure points can also represent landmarks.', 'group' : [{'name' : 'address-points'}], 'resources' : [{ 'name' : 'adp.zip', 'format' : 'application/zip', 'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of address points', 'url' : 'http://resource.not.yet.added.org' }] }) #pprint.pprint(datasets)

#dev #key= "xxxxxxxxxxxxxxx" #prd key= "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx" ua = 'seedingScript/1.0 (+https://gisdata.mn.gov)' #dev #ckan = RemoteCKAN('https://devel-ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)

#prd ckan = RemoteCKAN('https://ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)

#Get county names from CTU database api, which serves out in XML url = "http://www.mngeo.state.mn.us/CTU/query.pl?select=feature_name,ctu_type&where=ctu_type%20=%20%27County%27" response = requests.get(url) tree = ElementTree.fromstring(response.content)

#count is just to stop loop early while testing count = 0 #prefix is for testing since organizaitons are hard to delete. This allows us to create "throw away" orga.

#Do the actual loop of organizations for county in tree.iter('result'): #break for testing. No need to do all 87 counites.

if count == 1:
    break
count = count+1

county = county[0].text
countyPlain = county
county = prefix + county


#Create organization name
countyName = munge(county)

print "Creating organization for: "+ county
#print "Name: " + countyName
try:
    org = ckan.action.organization_create(name=countyName, title=county)
except:
    e = sys.exc_info()[0]
    print "Error: %s : " %e
    exc_type, exc_value, exc_traceback = sys.exc_info()
    lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
    print ''.join('!! ' + line for line in lines)  # Log it or whatever here

#pprint.pprint(org)
orgId = org['id']

##############################
# Now add the packages
##############################
for package in datasets:
    packageTitle = package['title'] + " - " + countyPlain
    packageName = munge(package['title'])

    #Add county name to packageName to make it unique
    packageName = packageName + "-" + countyName
    #Add the datasets
    dataset = []
    try:
        dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
    except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
        print (e)
        print (e.args)

    dict(dataset)

    #Update the resource so it is a file upload rather than a url.  Does not seem to be possible in initial loop
    resourceId = dataset['resources'][0]['id']
    fileName = dataset['resources'][0]['name']
    fileName = 'data/' + fileName
    resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))

exlibris avatar Aug 15 '16 20:08 exlibris

dataset does not seem to be a dict. When I try: print dict(dataset)

I get: TypeError: 'int' object is not iterable

exlibris avatar Aug 15 '16 20:08 exlibris

I did try the File Uploads as part of resource_create but it failed to create a file. This was the example I was using: mysite.action.resource_create( package_id='my-dataset-with-files', url='dummy-value', # ignored but required by CKAN<=2.5.x upload=open('/path/to/file/to/upload.csv', 'rb'))

I am going to write a stripped down script that does not do any looping- just creates a single package / resource / upload. Perhaps that will be easier to debug than the code-salad I posted to github yesterday.

From: Ian Ward [mailto:[email protected]] Sent: Monday, August 15, 2016 3:43 PM To: ckan/ckanapi [email protected] Cc: Koebrick, Andrew (MNIT) [email protected]; Author [email protected] Subject: Re: [ckan/ckanapi] Problems with package_create in api 2.5.2 (#94)

dict(dataset) only creates a new object and discards it. I meant you could use print dict(dataset) if you're seeing a -1 where you think you shouldn't be.

What is the full traceback you're seeing?

Instead of writing this script have you looked at the ckanapi load datasets --upload-resources command? You could write your datasets as a JSON lines file and pass it to that command to create and upload the datasets for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ckan/ckanapi/issues/94#issuecomment-239923076, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AElLWMLpfZt3QqgkiFAHHoCe1_bThPjJks5qgM9qgaJpZM4JkpIH.

exlibris avatar Aug 16 '16 13:08 exlibris

To make the code a little clearer, I created a cleaner test script that gets rid of most of the looping and hard codes some data, rather then get it from an external service. It better illustrates how ckanapi is behaving differently when run against CKAN 2.5.2 and 2.4.0:

import sys, pprint, traceback
from ckanapi import RemoteCKAN, NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError
ua = 'seedingScript/1.0 (+https://gisdata.mn.gov)'

def munge(string):
    result = string.replace(" ","-")
    result = result.lower()
    return result

#Create the list which will hold the dictionaries
datasets = []

datasets.append ({'title':'Test title',
                  'notes': 'Test description. ',
                  'group' : [{'name' : 'street-centerline'}],
                  'resources' : [{
                        'name' : 'rcl.zip',
                        'format' : 'application/zip',
                        'description' : 'A zip file containing either a shapefile of a single layer or a file geodatabase containg a single feature class of street centerlines',
                        'url' : 'dummy-value',
                        #'upload' : 'open("/NG911_data/rcl.zip", "rb")',
                        }]
                    })
#dev: this works, it is a CKAN 2.4.0 server
key= "XXXXXXXXXXXXXXX"
ckan = RemoteCKAN('https://devel-ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)
#prd: this fails, it is a CKAN 2.5.2 server
#key= "XXXXXXXXXXXXXXXXX"
#ckan = RemoteCKAN('https://ng911.gisdata.mn.gov/', user_agent=ua, apikey=key)
county='Test County'

#Create organization name
countyName = munge(county)
try:
    org = ckan.action.organization_create(name=countyName, title=county)
except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
    print (e)
    print (e.args)
orgId = org['id']

# add the packages
for package in datasets:
    packageTitle = package['title'] + " - " + county
    packageName = munge(package['title'])
    # add county name to packageName to make it unique
    packageName = packageName + "-" + countyName
    # add the datasets
    dataset = []
    try:
        dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
    except (NotAuthorized, NotFound,ValidationError, SearchQueryError, SearchError, CKANAPIError, ServerIncompatibleError) as e:
        print (e)
        print (e.args)

    #dataset is not a dictionary in CKAN api  2.5.2, rather it appears to be an int with the value of -1.
    # however it is a dictionary in CKAN api 2.4.0
    print dict(dataset)

# update the resource so it is a file upload rather than a url.  Does not seem to be possible in initial loop
# This is the section that is failing since dataset is not a dict.
resourceId = dataset['resources'][0]['id']
fileName = dataset['resources'][0]['name']
fileName = 'data/' + fileName
resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))

`

exlibris avatar Aug 16 '16 14:08 exlibris

I've just tried your script against ckan release-v2.5.2 commit id 919201e886fa95c8f26b9d970cb9ef7bbb916ca9 and it's working for me. What's the exact version of ckan you're running? Your script creates the dataset, uploads the file and prints out the dataset created.

wardi avatar Aug 16 '16 15:08 wardi

Also, which version of ckanapi?

wardi avatar Aug 16 '16 16:08 wardi

Ckanapi 3.6 (as installed by pip) Ckan-2.5.2, the tagged version (https://github.com/ckan/ckan/releases/tag/ckan-2.5.2). Not sure what commit number this gets…

I went ahead and seeded my portal without updating the resources to have .zip files attached and will just have the local government units override my dummy URL with the .zip files latter. Or I can try to programmatically update it latter if the api gets sorted out. But the time pressure is off. I was rushing against a deadline of noon today, since we have team members in the field training staff at some of our northern counties which border your fine country.

If you want to try to do any testing on our actual server, the URL is: https://ng911.gisdata.mn.gov/ We will be shutting firewalls around in as real data comes in, but for now it is open. I would be happy to sysadmin you. Unfortunately our dev server which will match the prd version-for-version is not yet provisioned (hence making sure I use “test” prefixes when creating organizations).

It would not surprise me at all if there is some simple failure at my end. Although I am just not seeing it.

Thanks for taking the time to look into this.

Andrew

From: Ian Ward [mailto:[email protected]] Sent: Tuesday, August 16, 2016 10:49 AM To: ckan/ckanapi [email protected] Cc: Koebrick, Andrew (MNIT) [email protected]; Author [email protected] Subject: Re: [ckan/ckanapi] Problems with package_create in api 2.5.2 (#94)

I've just tried your script against ckan release-v2.5.2 commit id 919201e886fa95c8f26b9d970cb9ef7bbb916ca9 and it's working for me. What's the exact version of ckan you're running? Your script creates the dataset, uploads the file and prints out the dataset created.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ckan/ckanapi/issues/94#issuecomment-240145228, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AElLWLNXkbuoWODrjyUt5oNwU4PbEgtoks5qgdvwgaJpZM4JkpIH.

exlibris avatar Aug 16 '16 17:08 exlibris