CumulusCI
CumulusCI copied to clipboard
extract_dataset sql error
Describe the bug
After a while of not using CCI, I have tried running task extract_dataset (cci task run extract_dataset -o mapping datasets/mapping.yml -o sql_path datasets/test.sql --org cci_holver_prod) for it fail on Error:
Table 'Account' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.
I have deleted the previously generated .sql files, but it seems like they are persisted in some MetaData instance, which I have no idea how to clear.
I have seen a similar issue on CCI github, but there the guy was using a SQLite DB instead of a script and he was able to drop the conflicting tables.
Any idea how to clear my MetaData instance?
Reproduction steps
- Create mapping.yml file.
- Run cci task run extract_dataset -o mapping datasets/mapping.yml -o sql_path datasets/test.sql --org cci_holver_prod
Your CumulusCI and Python versions
CumulusCI version: 3.64.0 Python version: 3.9.0
Operating System
W10
Windows environment
No response
CumulusCI installation method
No response
Error Gist
https://gist.github.com/ajanicko/9c2e430bb673436dd0366fb42c88c962
Additional information
No response
Well, I think the problem is with my mapping.yml file. Seems like you can't have different inserts for the same sf_object, like for example PersonAccount and BusinessAccount, but I think I was able to do that before.
Anyway, I raised a bug for similar issue before https://github.com/SFDO-Tooling/CumulusCI/issues/3349 ,but basically If I try to load_dataset a .sql script where both the BusinessAccounts and PersonAccounts reside in the same SQL table, which should be possible according to doc: "CumulusCI supports extracting and loading person account data. In your dataset definition, map person account fields like LastName, PersonBirthdate, or CustomContactField__pc to Account steps (i.e. where sf_object equals Account)."
I will get an error stating:
"Error: Error on record with id 1: INVALID_FIELD_FOR_INSERT_UPDATE:Account: bad field names on insert/update call: ForceAllowAML__pc, IsValid__pc, VisibleForBrokers__pc:ForceAllowAML__pc IsValid__pc
VisibleForBrokers__pc --"
I would like to end this wild ride of a post by saying this. I managed to split BusinessAccounts and PersonAccounts by skipping the automatic RecordTypeId feature and using the deprecated record_type feature, which does not create the mapping tables. My final mapping.yml file looks like this:
Insert InstitutionAccount:
sf_object: Account
table: InstitutionAccount
fields:
- Name
record_type: Institution
Insert ClientFOAccount:
sf_object: Account
table: ClientFOAccount
fields:
- LastName
record_type: PersonAccount
In theory, it should work fine to extract and load Person Accounts as part of the overall Account step (i.e., no filtering). If you're finding that's not the case, feel free to open another issue with a quick repro so we can triage it.
The solution you found does avoid the issue where two steps have the same auto-generated table name. We should handle that more gracefully. Tracking that as W-11833439.
Well I feel like there are 3 very similar issues with Accounts/PersonAccounts.
The first and the main issue is: You generate a mapping, this mapping will have one Account table containing both non PA fields and PA fields. After that you extract data based on this mapping, which will result in an SQL script containing Account data in one table with all fields (_pc and non _pc). Now the issue starts when you try to load this data, as it will fail when it tries to load _pc fields into non PA Accounts. If this is fixed by somehow ignoring custom _pc fields when loading data, then all the subsequent issues won't matter.
The second issue is: If you do try to split the mapping for Accounts into 2 Inserts, which will allow you to split _pc and non _pc fields as well, you will get the issue I outlined in https://github.com/SFDO-Tooling/CumulusCI/issues/3349 in which you will get 2 _rt_mapping tables with the same name, that breaks the SQL script.
The third issue is similar to second and I am not entirely sure what I did to get it (it might be due to table keyword): In some cases, during the extract_dataset task, the python script will recognize, that the _rt_mapping table for given sf_object already exists and instead of creating a new _rt_mapping table for the same sf_object, it will try to add the same RTs into the same table, which borks the SQL script again due to UNIQUE constraints.
Now I am no python developer, but I have been going through the scripts and trying to understand them, maybe even fixing the issue for myself, before I found the record_type feature, which skips the creation of _rt_mapping tables completely and I figure the easiest fix would be to fix the third issue, by not trying to insert records into the _rt_mapping table if they are already there. But inherently even the first (main) issue should be fixable in the load_dataset task by ignoring all _pc fields when inserting any non PA Account record into SF.
If this is fixed by somehow ignoring custom _pc fields when loading data, then all the subsequent issues won't matter.
That is the way this feature is intended to work. I will add a second bug to our backlog to investigate why this is not working as designed. Thank you for your patience as we improve this tool - I really appreciate the detailed feedback.
Tracking this bug as W-11848870