ckanapi icon indicating copy to clipboard operation
ckanapi copied to clipboard

Fix some errors when executing dumps

Open pdelboca opened this issue 2 years ago • 2 comments

Hello!

I'm trying to do a dump of an instance but the package is throwing some errors. This PR is to fix whatever is appearing.

Problems when logging errors

TODO: See #209

KeyError: 'format'

Traceback (most recent call last):
  File "/home/pdelboca/Repos/ckanapi/.venv/bin/ckanapi", line 33, in <module>
    sys.exit(load_entry_point('ckanapi', 'console_scripts', 'ckanapi')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/cli/main.py", line 156, in main
    return dump_things(ckan, thing[0], arguments)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/cli/dump.py", line 110, in dump_things
    create_datapackage(record, datapackages_path, stderr, apikey)
  File "/home/pdelboca/Repos/ckanapi/ckanapi/datapackage.py", line 67, in create_datapackage
    filename = resource_filename(dres)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/datapackage.py", line 87, in resource_filename
    ext = slugify.slugify(dres['format'])
                          ~~~~^^^^^^^^^^
KeyError: 'format'

pdelboca avatar Nov 21 '23 17:11 pdelboca

@wardi have you ever used ckanapi to do a dump of a portal? I'm trying to do a dump of https://datos.gob.ar/ but it is extremely slow and it also gets "blocked" after 250 datasets. (Blocked = doesnt write any output, no progress, nothing is happening)

I'm trying to do: ckanapi dump datasets --all --datapackages=./output_directory/ -r https://datos.gob.ar

pdelboca avatar Nov 22 '23 11:11 pdelboca

@pdelboca we use it daily to create a history of our metadata for ~30k datasets. It's possible you're being throttled on the server side. dump datasets makes a separate package_show query for every dataset, you could try using search datasets instead that paginates over package_search instead for fewer requests.

It's possible to resume an interrupted load but not the dump command at the moment, maybe that's needed if you are being throttled.

wardi avatar Nov 24 '23 19:11 wardi