CumulusCI
CumulusCI copied to clipboard
Mapping files with steps that are not 1-1 with SObjects are unreliable for extraction
Describe the bug
As much as possible, mapping files should be reusable rather than purpose-specific. Mapping files can have multiple steps associated with the same SObject because different steps can load different subsets. There are at least two bugs if you try to use such a mapping file for data extractions.
Reproduction steps Steps to reproduce the behavior:
-
Download this file: data_recipe_with_rd2_mapping.yml.txt
-
Run this command:
cci task run extract_dataset -o mapping data_recipe_with_rd2_mapping.yml --org qa
Context
CumulusCI version: 3.18.0 (/Users/pprescod/.local/bin/cci) Python version: 3.8.5 (/Users/pprescod/.local/pipx/venvs/cumulusci/bin/python)
You have the latest version of CumulusCI.
Traceback
File "/Users/pprescod/.local/pipx/venvs/cumulusci/lib/python3.8/site-packages/cumulusci/tasks/bulkdata/extract.py", line 80, in _run_task
self._init_db()
File "/Users/pprescod/.local/pipx/venvs/cumulusci/lib/python3.8/site-packages/cumulusci/tasks/bulkdata/extract.py", line 103, in _init_db
self._create_tables()
File "/Users/pprescod/.local/pipx/venvs/cumulusci/lib/python3.8/site-packages/cumulusci/tasks/bulkdata/extract.py", line 301, in _create_tables
self._create_table(mapping)
File "/Users/pprescod/.local/pipx/venvs/cumulusci/lib/python3.8/site-packages/cumulusci/tasks/bulkdata/extract.py", line 310, in _create_table
t = create_table(mapping, self.metadata)
File "/Users/pprescod/.local/pipx/venvs/cumulusci/lib/python3.8/site-packages/cumulusci/tasks/bulkdata/utils.py", line 109, in create_table
t = Table(mapping["table"], metadata, *fields)
File "
Additional information Add any other context about the problem here.
Do we have a good reason to fix this as such, or can we consider either (a) being more aggressive in deprecating the legacy filters and record_type clauses, or (b) adding validation rules to ensure the mapping file is constructed safely?
It does work if
- The steps have different table names (
tableis now inferred if not specified). - The steps all specify a
record_type(in this case CCI will over-extract for the second Account step becausefiltersaren't applied on extract). - The mapping just uses
RecordTypeIdinstead of filtering with the legacy features.
I'm open to various solutions, but the code to do the validation may not be significantly more than the code to generate a table with all of the columns that all of the steps need. But yeah, I don't feel strongly about how we deal with it. as long as. our messaging is better than a traceback.
The whole idea of using a "mapping file" which is really an imperative set of steps for both import and export may not be the right abstraction. Exports should be easily parallelized (if we decide to do that) whereas loads need more serialization.
I'm curious whether you know if cases where the customer needs to control the export order or whether the system is free to do it in any order as long as links are fixed up by the end?
I completely agree that extraction steps are/should be idempotent and parallelizable.
There are no possible side effects to changing the extraction order that I'm aware of, although there are multiple relevant governor limits that could bite us on parallelization.
Generally speaking, I really don't like breaking the 1:1 connection between tables and sObjects - I find it harder to reason about and think that in most cases it's being used as a patch around other weaknesses in the tooling that I'd rather we fix, like we did with record type support. But I'm open to being shown I'm wrong and there's real use cases there.