ckanext-xloader icon indicating copy to clipboard operation
ckanext-xloader copied to clipboard

CKAN master branch now generating test errors

Open duttonw opened this issue 4 months ago • 3 comments

Seems we have python worker are not putting the 'Express Load completed' log and worker pid crashes in Master.

More investigation required. This does not affect 2.11.x

2025-05-30 20:37:17,657 INFO  [ckan.config.environment] Loading templates from /srv/app/src/ckan/ckan/templates

ckanext/xloader/tests/test_action.py .........                           [  8%]
ckanext/xloader/tests/test_jobs.py FFFFFFF...F                           [ 18%]
ckanext/xloader/tests/test_loader.py ..ss...............s....ss......... [ 50%]
.                                                                        [ 51%]
ckanext/xloader/tests/test_parser.py ....                                [ 55%]
ckanext/xloader/tests/test_plugin.py ...........                         [ 65%]
ckanext/xloader/tests/test_utils.py .................................... [ 99%]
.                                                                        [100%]

=================================== FAILURES ===================================
_______________ TestXLoaderJobs.test_xloader_data_into_datastore _______________

self = <ckanext.xloader.tests.test_jobs.TestXLoaderJobs object at 0x7f91831ba1d0>
cli = <ckan.tests.helpers.CKANCliRunner object at 0x7f91831b7df0>
data = {'api_key': '***': 'ca4295f6-2222-4c7f-91f0-1afbec467b57', ...}, 'result_url': 'http://localhost:5000/api/3/action/xloader_hook'}

    def test_xloader_data_into_datastore(self, cli, data):
        self.enqueue(jobs.xloader_data_into_datastore, [data])
        with mock.patch("ckanext.xloader.jobs.get_response", get_response):
            stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output
            assert "File hash: d44fa65eda3675e11710682fdb5f1648" in stdout
            assert "Fields: [{'id': 'x', 'type': 'text', 'strip_extra_white': True}, {'id': 'y', 'type': 'text', 'strip_extra_white': True}]" in stdout
            assert "Copying to database..." in stdout
            assert "Creating search index..." in stdout
>           assert "Express Load completed" in stdout
Name                                   Stmts   Miss  Cover
----------------------------------------------------------
ckanext/xloader/__init__.py                0      0   100%
ckanext/xloader/action.py                151     50    67%
ckanext/xloader/auth.py                   10      2    80%
ckanext/xloader/cli.py                    40     23    42%
ckanext/xloader/command.py                91     78    14%
ckanext/xloader/db.py                    135     49    64%
ckanext/xloader/helpers.py                58     50    14%
ckanext/xloader/interfaces.py              6      2    67%
ckanext/xloader/job_exceptions.py         18      6    67%
ckanext/xloader/jobs.py                  354    132    63%
ckanext/xloader/loader.py                350     48    86%
ckanext/xloader/parser.py                 59      1    98%
ckanext/xloader/plugin.py                143     38    73%
ckanext/xloader/schema.py                 18      0   100%
ckanext/xloader/tests/__init__.py          0      0   100%
ckanext/xloader/tests/ckan_setup.py       21     21     0%
ckanext/xloader/tests/fixtures.py         64     49    23%
ckanext/xloader/tests/test_action.py      64      2    97%
ckanext/xloader/tests/test_jobs.py       146     26    82%
ckanext/xloader/tests/test_loader.py     394     37    91%
ckanext/xloader/tests/test_parser.py      26      0   100%
ckanext/xloader/tests/test_plugin.py      65      3    95%
ckanext/xloader/tests/test_utils.py       21      0   100%
ckanext/xloader/utils.py                 182     56    69%
ckanext/xloader/views.py                  39     29    26%
----------------------------------------------------------
TOTAL                                   2455    702    71%
=========================== short test summary info ============================
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_xloader_data_into_datastore - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:24,749 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:25,731 INFO  [ckan.lib.jobs] Worker rq:worker:2a0de08c151741e0a9f8771e93e138e0 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_download_resource_data_with_ckanext_xloader_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:26,350 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:27,281 INFO  [ckan.lib.jobs] Worker rq:worker:8e9103ea35874bccbd6f72f2f6f40848 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_download_resource_data_with_ckan_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:27,930 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:28,850 INFO  [ckan.lib.jobs] Worker rq:worker:a0550f4414b94b1fa9d473edfda1ae45 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_download_resource_data_with_different_original_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:29,468 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:30,389 INFO  [ckan.lib.jobs] Worker rq:worker:10bde992909d4e96b5506ddf642cdfa0 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_callback_xloader_hook_with_ckanext_xloader_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:31,032 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:31,979 INFO  [ckan.lib.jobs] Worker rq:worker:5f959ef0ea1e4b8d9d13f4f83596cdf5 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_callback_xloader_hook_with_ckan_site_url - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:32,596 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:33,521 INFO  [ckan.lib.jobs] Worker rq:worker:1b0ded9f2f614b9983583288e647e846 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestXLoaderJobs::test_xloader_ignore_hash - AssertionError: assert 'Express Load completed' in '2025-05-30 20:37:34,215 INFO  [ckan.cli] Using configuration file /__w/ckanext-xloader/ckanext-xloader/test.ini\n2025...025-05-30 20:37:35,146 INFO  [ckan.lib.jobs] Worker rq:worker:09dc8eebeece4bb6899dfef7bea46f26 (PID 173) has stopped\n'
FAILED ckanext/xloader/tests/test_jobs.py::TestSetResourceMetadata::test_simple - AssertionError: both default and custom schema data must be passed
============== 8 failed, 95 passed, 5 skipped in 65.20s (0:01:05) ==============

duttonw avatar Aug 10 '25 07:08 duttonw

Compiling here some actual issues I found while working on other stuff. Just adding them as reference for when support for CKAN 2.12 is actually worked on:

  1. SQLAlchemy 2.0 compatibility issues. CKAN 2.12 uses sqlalchemy 2.0 and there is code that needs to be updated. Here's a non-compehensive diff of some changes needed, there are more for sure:
diff --git a/ckanext/xloader/db.py b/ckanext/xloader/db.py
index 6bb25d9..dcfb8b9 100644
--- a/ckanext/xloader/db.py
+++ b/ckanext/xloader/db.py
@@ -119,9 +119,16 @@ def get_job(job_id):
     if not result:
         return None

+    try:
+        # SQLAlchemcy < 2.0
+        fields = list(result.keys())
+    except AttributeError:
+        # SQLAlchemy >= 2.0
+        fields = result._mapping.keys()
+
     # Turn the result into a dictionary representation of the job.
     result_dict = {}
-    for field in list(result.keys()):
+    for field in fields:
         value = getattr(result, field)
         if value is None:
             result_dict[field] = value
@@ -452,6 +459,8 @@ def _get_metadata(job_id):
                 METADATA_TABLE.c.job_id == job_id)).fetchall()
     metadata = {}
     for row in results:
+        # Compatibility with SQLAlchemy >= 2.0
+        row = dict(zip(row._fields, row))
         value = row['value']
         if row['type'] == 'json':
             value = json.loads(value)
diff --git a/ckanext/xloader/loader.py b/ckanext/xloader/loader.py
index 241ff9d..49ae093 100644
--- a/ckanext/xloader/loader.py
+++ b/ckanext/xloader/loader.py
@@ -677,7 +677,7 @@ def _enable_fulltext_trigger(connection, resource_id):

 def _get_rows_count_of_resource(connection, table):
     count_query = ''' SELECT count(_id) from {table} '''.format(table=table)
-    results = connection.execute(count_query)
+    results = connection.execute(sa.text(count_query))
     rows_count = int(results.first()[0])
     return rows_count
  1. Changes in dataset indexing. After https://github.com/ckan/ckan/pull/8396 the internals of sending a dataset to index have changed a bit
2025-10-02 17:20:48,306 ERROR [ckanext.xloader.jobs] xloader error: both default and custom schema data must be passed, Traceback (most recent call last):
  File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 139, in xloader_data_into_datastore
    xloader_data_into_datastore_(input, job_dict, logger)
  File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 321, in xloader_data_into_datastore_
    direct_load(allow_type_guessing=True)
  File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 269, in direct_load
    set_datastore_active(data, resource, logger)
  File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/jobs.py", line 507, in set_datastore_active
    set_resource_metadata(update_dict=data)
  File "/home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/utils.py", line 275, in set_resource_metadata
    psi.index_package(solr_data_dict)
  File "/home/adria/dev/projects/ckan-py310/ckan/ckan/lib/search/index.py", line 115, in index_package
    assert 'with_custom_schema' in pkg_dict, (
AssertionError: both default and custom schema data must be passed
``

3. I can't reproduce it now but I got an exception with a `ValidationError` referring to a `user_id` missing field, which I traced back to activities requiring a `user` in the `context` after https://github.com/ckan/ckan/pull/7627. Maybe it was something on muy env, I'll add more details if it pops up again.

amercader avatar Oct 02 '25 15:10 amercader

I've been working on this at https://github.com/qld-gov-au/ckanext-xloader/compare/develop...qld-gov-au:ckanext-xloader:github-262-click-logging but it's not ready yet.

ThrawnCA avatar Oct 02 '25 20:10 ThrawnCA

Great news @ThrawnCA

Another one:

Resource delete returns an error when xloader enabled. UI returns a 404, logs say:

2025-10-03 10:54:51,675 ERROR [ckan.logic] Resource e1e5705c-7b9c-4417-93e1-b32a83114121 exists but it is not found in the package it should belong to.

I think is related to the xloader catching the IDomainNotification emitted when the resource deletion is committed:

  /home/adria/dev/projects/ckan-py310/ckan/ckan/model/meta.py(69)ckan_before_commit()
     68     dome = DomainObjectModificationExtension()
---> 69     dome.before_commit(session)
     70

  /home/adria/dev/projects/ckan-py310/ckan/ckan/model/modification.py(23)before_commit()
     22     def before_commit(self, session: Any):
---> 23         self.notify_observers(session, self.notify)
     24

  /home/adria/dev/projects/ckan-py310/ckan/ckan/model/modification.py(43)notify_observers()
     42             if isinstance(obj, model.Resource):
---> 43                 method(obj, model.DomainObjectOperation.changed)
     44             if getattr(obj, 'url_changed', False):

  /home/adria/dev/projects/ckan-py310/ckan/ckan/model/modification.py(68)notify()
     66         for observer in plugins.PluginImplementations(
     67                 plugins.IDomainObjectModification):
---> 68             observer.notify(entity, operation)

  /home/adria/dev/projects/ckan-py310/ckanext-xloader/ckanext/xloader/plugin.py(105)notify()
    104         }
--> 105         resource_dict = toolkit.get_action("resource_show")(
    106             context,

  /home/adria/dev/projects/ckan-py310/ckan/ckan/logic/__init__.py(581)wrapped()
    580                 # check_access(action_name, context, data_dict=None)
--> 581                 result = _action(context, data_dict, **kw)
    582                 try:

> /home/adria/dev/projects/ckan-py310/ckan/ckan/logic/action/get.py(1072)resource_show()
   1071         import ipdb; ipdb.set_trace()
-> 1072         log.error('Resource %s exists but it is not found in the package it should belong to.', id)
   1073         raise NotFound(_('Resource was not found.'))
``

amercader avatar Oct 03 '25 08:10 amercader