CKAN master branch now generating test errors
Seems we have python worker are not putting the 'Express Load completed' log and worker pid crashes in Master.
More investigation required. This does not affect 2.11.x
2025-05-30 20:37:17,657 INFO [ckan.config.environment] Loading templates from /srv/app/src/ckan/ckan/templates
ckanext/xloader/tests/test_action.py ......... [ 8%]
ckanext/xloader/tests/test_jobs.py FFFFFFF...F [ 18%]
ckanext/xloader/tests/test_loader.py ..ss...............s....ss......... [ 50%]
. [ 51%]
ckanext/xloader/tests/test_parser.py .... [ 55%]
ckanext/xloader/tests/test_plugin.py ........... [ 65%]
ckanext/xloader/tests/test_utils.py .................................... [ 99%]
. [100%]
=================================== FAILURES ===================================
_______________ TestXLoaderJobs.test_xloader_data_into_datastore _______________
self = <ckanext.xloader.tests.test_jobs.TestXLoaderJobs object at 0x7f91831ba1d0>
cli = <ckan.tests.helpers.CKANCliRunner object at 0x7f91831b7df0>
data = {'api_key': '***': 'ca4295f6-2222-4c7f-91f0-1afbec467b57', ...}, 'result_url': 'http://localhost:5000/api/3/action/xloader_hook'}
def test_xloader_data_into_datastore(self, cli, data):
self.enqueue(jobs.xloader_data_into_datastore, [data])
with mock.patch("ckanext.xloader.jobs.get_response", get_response):
stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output
assert "File hash: d44fa65eda3675e11710682fdb5f1648" in stdout
assert "Fields: [{'id': 'x', 'type': 'text', 'strip_extra_white': True}, {'id': 'y', 'type': 'text', 'strip_extra_white': True}]" in stdout
assert "Copying to database..." in stdout
assert "Creating search index..." in stdout
> assert "Express Load completed" in stdout
Name Stmts Miss Cover
----------------------------------------------------------
ckanext/xloader/__init__.py 0 0 100%
ckanext/xloader/action.py 151 50 67%
ckanext/xloader/auth.py 10 2 80%
ckanext/xloader/cli.py 40 23 42%
ckanext/xloader/command.py 91 78 14%
ckanext/xloader/db.py 135 49 64%
ckanext/xloader/helpers.py 58 50 14%
ckanext/xloader/interfaces.py 6 2 67%
ckanext/xloader/job_exceptions.py 18 6 67%
ckanext/xloader/jobs.py 354 132 63%
ckanext/xloader/loader.py 350 48 86%
ckanext/xloader/parser.py 59 1 98%
ckanext/xloader/plugin.py 143 38 73%
ckanext/xloader/schema.py 18 0 100%
ckanext/xloader/tests/__init__.py 0 0 100%
ckanext/xloader/tests/ckan_setup.py 21 21 0%
ckanext/xloader/tests/fixtures.py 64 49 23%
ckanext/xloader/tests/test_action.py 64 2 97%
ckanext/xloader/tests/test_jobs.py 146 26 82%
ckanext/xloader/tests/test_loader.py 394 37 91%
ckanext/xloader/tests/test_parser.py 26 0 100%
ckanext/xloader/tests/test_plugin.py 65 3 95%
ckanext/xloader/tests/test_utils.py 21 0 100%
ckanext/xloader/utils.py 182 56 69%
ckanext/xloader/views.py 39 29 26%
----------------------------------------------------------
TOTAL 2455 702 71%
=========================== short test summary info ============================
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_xloader_data_into_datastore - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:24,749 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:25,731 INFO [ckan.lib.jobs] Worker rq:worker:2a0de08c151741e0a9f8771e93e138e0 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_download_resource_data_with_ckanext_xloader_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:26,350 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:27,281 INFO [ckan.lib.jobs] Worker rq:worker:8e9103ea35874bccbd6f72f2f6f40848 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_download_resource_data_with_ckan_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:27,930 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:28,850 INFO [ckan.lib.jobs] Worker rq:worker:a0550f4414b94b1fa9d473edfda1ae45 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_download_resource_data_with_different_original_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:29,468 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:30,389 INFO [ckan.lib.jobs] Worker rq:worker:10bde992909d4e96b5506ddf642cdfa0 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_callback_xloader_hook_with_ckanext_xloader_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:31,032 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:31,979 INFO [ckan.lib.jobs] Worker rq:worker:5f959ef0ea1e4b8d9d13f4f83596cdf5 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_callback_xloader_hook_with_ckan_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:32,596 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:33,521 INFO [ckan.lib.jobs] Worker rq:worker:1b0ded9f2f614b9983583288e647e846 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_xloader_ignore_hash - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:34,215 INFO [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:35,146 INFO [ckan.lib.jobs] Worker rq:worker:09dc8eebeece4bb6899dfef7bea46f26 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestSetResourceMetadata::test_simple - AssertionError: both default and custom schema data must be passed
============== 8 failed, 95 passed, 5 skipped in 65.20s (0:01:05) ==============
Compiling here some actual issues I found while working on other stuff. Just adding them as reference for when support for CKAN 2.12 is actually worked on:
- SQLAlchemy 2.0 compatibility issues. CKAN 2.12 uses sqlalchemy 2.0 and there is code that needs to be updated. Here's a non-compehensive diff of some changes needed, there are more for sure:
diff --git a/ckanext/xloader/db.py b/ckanext/xloader/db.py
index 6bb25d9..dcfb8b9 100644
--- a/ckanext/xloader/db.py
+++ b/ckanext/xloader/db.py
@@ -119,9 +119,16 @@ def get_job(job_id):
if not result:
return None
+ try:
+ # SQLAlchemcy < 2.0
+ fields = list(result.keys())
+ except AttributeError:
+ # SQLAlchemy >= 2.0
+ fields = result._mapping.keys()
+
# Turn the result into a dictionary representation of the job.
result_dict = {}
- for field in list(result.keys()):
+ for field in fields:
value = getattr(result, field)
if value is None:
result_dict[field] = value
@@ -452,6 +459,8 @@ def _get_metadata(job_id):
METADATA_TABLE.c.job_id == job_id)).fetchall()
metadata = {}
for row in results:
+ # Compatibility with SQLAlchemy >= 2.0
+ row = dict(zip(row._fields, row))
value = row['value']
if row['type'] == 'json':
value = json.loads(value)
diff --git a/ckanext/xloader/loader.py b/ckanext/xloader/loader.py
index 241ff9d..49ae093 100644
--- a/ckanext/xloader/loader.py
+++ b/ckanext/xloader/loader.py
@@ -677,7 +677,7 @@ def _enable_fulltext_trigger(connection, resource_id):
def _get_rows_count_of_resource(connection, table):
count_query = ''' SELECT count(_id) from {table} '''.format(table=table)
- results = connection.execute(count_query)
+ results = connection.execute(sa.text(count_query))
rows_count = int(results.first()[0])
return rows_count
- Changes in dataset indexing. After https://github.com/ckan/ckan/pull/8396 the internals of sending a dataset to index have changed a bit
2025-10-02 17:20:48,306 ERROR [ckanext.xloader.jobs] xloader error: both default and custom schema data must be passed, Traceback (most recent call last):
File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 139, in xloader_data_into_datastore
xloader_data_into_datastore_(input, job_dict, logger)
File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 321, in xloader_data_into_datastore_
direct_load(allow_type_guessing=True)
File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 269, in direct_load
set_datastore_active(data, resource, logger)
File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 507, in set_datastore_active
set_resource_metadata(update_dict=data)
File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/utils.py", line 275, in set_resource_metadata
psi.index_package(solr_data_dict)
File "/home/adria/dev/projects/ckan-py310/ckan/ckan/lib/search/index.py", line 115, in index_package
assert 'with_custom_schema' in pkg_dict, (
AssertionError: both default and custom schema data must be passed
``
3. I can't reproduce it now but I got an exception with a `ValidationError` referring to a `user_id` missing field, which I traced back to activities requiring a `user` in the `context` after https://github.com/ckan/ckan/pull/7627. Maybe it was something on muy env, I'll add more details if it pops up again.
I've been working on this at https://github.com/qld-gov-au/ckanext-xloader/compare/develop...qld-gov-au:ckanext-xloader:github-262-click-logging but it's not ready yet.
Great news @ThrawnCA
Another one:
Resource delete returns an error when xloader enabled. UI returns a 404, logs say:
2025-10-03 10:54:51,675 ERROR [ckan.logic] Resource e1e5705c-7b9c-4417-93e1-b32a83114121 exists but it is not found in the package it should belong to.
I think is related to the xloader catching the IDomainNotification emitted when the resource deletion is committed:
/home/adria/dev/projects/ckan-py310/ckan/ckan/model/meta.py(69)ckan_before_commit()
68 dome = DomainObjectModificationExtension()
---> 69 dome.before_commit(session)
70
/home/adria/dev/projects/ckan-py310/ckan/ckan/model/modification.py(23)before_commit()
22 def before_commit(self, session: Any):
---> 23 self.notify_observers(session, self.notify)
24
/home/adria/dev/projects/ckan-py310/ckan/ckan/model/modification.py(43)notify_observers()
42 if isinstance(obj, model.Resource):
---> 43 method(obj, model.DomainObjectOperation.changed)
44 if getattr(obj, 'url_changed', False):
/home/adria/dev/projects/ckan-py310/ckan/ckan/model/modification.py(68)notify()
66 for observer in plugins.PluginImplementations(
67 plugins.IDomainObjectModification):
---> 68 observer.notify(entity, operation)
/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/plugin.py(105)notify()
104 }
--> 105 resource_dict = toolkit.get_action("resource_show")(
106 context,
/home/adria/dev/projects/ckan-py310/ckan/ckan/logic/__init__.py(581)wrapped()
580 # check_access(action_name, context, data_dict=None)
--> 581 result = _action(context, data_dict, **kw)
582 try:
> /home/adria/dev/projects/ckan-py310/ckan/ckan/logic/action/get.py(1072)resource_show()
1071 import ipdb; ipdb.set_trace()
-> 1072 log.error('Resource %s exists but it is not found in the package it should belong to.', id)
1073 raise NotFound(_('Resource was not found.'))
``