CumulusCI icon indicating copy to clipboard operation
CumulusCI copied to clipboard

extract_dataset sql error

Open ajanicko opened this issue 3 years ago • 7 comments

Describe the bug

After a while of not using CCI, I have tried running task extract_dataset (cci task run extract_dataset -o mapping datasets/mapping.yml -o sql_path datasets/test.sql --org cci_holver_prod) for it fail on Error:

Table 'Account' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

I have deleted the previously generated .sql files, but it seems like they are persisted in some MetaData instance, which I have no idea how to clear.

I have seen a similar issue on CCI github, but there the guy was using a SQLite DB instead of a script and he was able to drop the conflicting tables.

Any idea how to clear my MetaData instance?

Reproduction steps

  1. Create mapping.yml file.
  2. Run cci task run extract_dataset -o mapping datasets/mapping.yml -o sql_path datasets/test.sql --org cci_holver_prod

Your CumulusCI and Python versions

CumulusCI version: 3.64.0 Python version: 3.9.0

Operating System

W10

Windows environment

No response

CumulusCI installation method

No response

Error Gist

https://gist.github.com/ajanicko/9c2e430bb673436dd0366fb42c88c962

Additional information

No response

ajanicko avatar Sep 29 '22 11:09 ajanicko

Well, I think the problem is with my mapping.yml file. Seems like you can't have different inserts for the same sf_object, like for example PersonAccount and BusinessAccount, but I think I was able to do that before.

ajanicko avatar Sep 29 '22 11:09 ajanicko

Anyway, I raised a bug for similar issue before https://github.com/SFDO-Tooling/CumulusCI/issues/3349 ,but basically If I try to load_dataset a .sql script where both the BusinessAccounts and PersonAccounts reside in the same SQL table, which should be possible according to doc: "CumulusCI supports extracting and loading person account data. In your dataset definition, map person account fields like LastName, PersonBirthdate, or CustomContactField__pc to Account steps (i.e. where sf_object equals Account)."

I will get an error stating: "Error: Error on record with id 1: INVALID_FIELD_FOR_INSERT_UPDATE:Account: bad field names on insert/update call: ForceAllowAML__pc, IsValid__pc, VisibleForBrokers__pc:ForceAllowAML__pc IsValid__pc
VisibleForBrokers__pc --"

ajanicko avatar Sep 29 '22 13:09 ajanicko

I would like to end this wild ride of a post by saying this. I managed to split BusinessAccounts and PersonAccounts by skipping the automatic RecordTypeId feature and using the deprecated record_type feature, which does not create the mapping tables. My final mapping.yml file looks like this:

Insert InstitutionAccount:
  sf_object: Account
  table: InstitutionAccount
  fields:
  - Name
  record_type: Institution
Insert ClientFOAccount:
  sf_object: Account
  table: ClientFOAccount
  fields:
  - LastName
  record_type: PersonAccount

ajanicko avatar Sep 29 '22 14:09 ajanicko

In theory, it should work fine to extract and load Person Accounts as part of the overall Account step (i.e., no filtering). If you're finding that's not the case, feel free to open another issue with a quick repro so we can triage it.

The solution you found does avoid the issue where two steps have the same auto-generated table name. We should handle that more gracefully. Tracking that as W-11833439.

davidmreed avatar Sep 30 '22 04:09 davidmreed

Well I feel like there are 3 very similar issues with Accounts/PersonAccounts.

The first and the main issue is: You generate a mapping, this mapping will have one Account table containing both non PA fields and PA fields. After that you extract data based on this mapping, which will result in an SQL script containing Account data in one table with all fields (_pc and non _pc). Now the issue starts when you try to load this data, as it will fail when it tries to load _pc fields into non PA Accounts. If this is fixed by somehow ignoring custom _pc fields when loading data, then all the subsequent issues won't matter.

The second issue is: If you do try to split the mapping for Accounts into 2 Inserts, which will allow you to split _pc and non _pc fields as well, you will get the issue I outlined in https://github.com/SFDO-Tooling/CumulusCI/issues/3349 in which you will get 2 _rt_mapping tables with the same name, that breaks the SQL script.

The third issue is similar to second and I am not entirely sure what I did to get it (it might be due to table keyword): In some cases, during the extract_dataset task, the python script will recognize, that the _rt_mapping table for given sf_object already exists and instead of creating a new _rt_mapping table for the same sf_object, it will try to add the same RTs into the same table, which borks the SQL script again due to UNIQUE constraints.

Now I am no python developer, but I have been going through the scripts and trying to understand them, maybe even fixing the issue for myself, before I found the record_type feature, which skips the creation of _rt_mapping tables completely and I figure the easiest fix would be to fix the third issue, by not trying to insert records into the _rt_mapping table if they are already there. But inherently even the first (main) issue should be fixable in the load_dataset task by ignoring all _pc fields when inserting any non PA Account record into SF.

ajanicko avatar Sep 30 '22 12:09 ajanicko

If this is fixed by somehow ignoring custom _pc fields when loading data, then all the subsequent issues won't matter.

That is the way this feature is intended to work. I will add a second bug to our backlog to investigate why this is not working as designed. Thank you for your patience as we improve this tool - I really appreciate the detailed feedback.

davidmreed avatar Oct 03 '22 23:10 davidmreed

Tracking this bug as W-11848870

davidmreed avatar Oct 03 '22 23:10 davidmreed