salesforcer Error when trying to enable PKChunking

Issue description

Hi, I'm running into an issue trying to enable PKChunking on a query I'm making to SF. I am referring to your answer on SO here but I keep getting the error message Error in if (grepl("/services/data/v[0-9]{2}.[0-9]{1}/jobs/ingest", url)) { : the condition has length > 1.

Do you have any idea what might be going wrong?

Have included the query, verbose output and session info. I'm new to salesforce and your package, so please let me know what other information might help you troubleshoot this issue.

Query

campaign_member_query_test <- glue(
  "
  SELECT Campaign.Business_Unit_Affiliation__c,
    Campaign.Product__c,
    CM_Contact_CSID__c,
    Contact_Has_Test_Flag__c,
    Event_Registration__c,
    Event_Registration_Date__c,
    Event_Amount_Raised__c,
    RC_Registrant_Status__c,
    RC_RegistrationDate__c
  FROM CampaignMember
  "
)

campaign_members <- sf_query(soql = campaign_member_query_test,
                             object_name = "CampaignMember",
                             api_type = "Bulk 1.0",
                             PKChunkingHeader = list(`Sforce-Enable-PKChunking` = TRUE),
                             interval_seconds = 10,
                             max_attempts = 400, verbose = TRUE)

Verbose output

--HTTP Request----------------
POST https://ncca-ltd.my.salesforce.com/services/async/54.0/job
--Headers---------------------
Accept: application/xml; Content-Type: application/xml; Sforce-Enable-PKChunking: TRUE; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--Body------------------------
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
  <operation>query</operation>
  <object>CampaignMember</object>
  <concurrencyMode>Parallel</concurrencyMode>
  <contentType>CSV</contentType>
</jobInfo>


--HTTP Request----------------
POST https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: text/csv; charset=UTF-8; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--Body------------------------
Uploaded TXT file: C:\Users\Charith\AppData\Local\Temp\RtmpuGaHBg\file5e7c3a9e62bd
Attempt #1

--HTTP Request----------------
GET https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch/7512x00000JkKEaAAN
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: ; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca

--HTTP Request----------------
GET https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: ; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
Attempt #1
Error in if (grepl("/services/data/v[0-9]{2}.[0-9]{1}/jobs/ingest", url)) { : 
  the condition has length > 1

Session Info

─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22000)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2022-09-13
 rstudio  2022.07.1+554 Spotted Wakerobin (desktop)
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version   date (UTC) lib source
 afp.data    * 0.1.0     2022-09-07 [1] local
 anytime       0.3.9     2020-08-27 [1] CRAN (R 4.2.1)
 askpass       1.1       2019-01-13 [1] CRAN (R 4.2.1)
 assertthat    0.2.1     2019-03-21 [1] CRAN (R 4.2.1)
 base64enc     0.1-3     2015-07-28 [1] CRAN (R 4.2.0)
 bit           4.0.4     2020-08-04 [1] CRAN (R 4.2.1)
 bit64         4.0.5     2020-08-30 [1] CRAN (R 4.2.1)
 cachem        1.0.6     2021-08-19 [1] CRAN (R 4.2.1)
 callr         3.7.0     2021-04-20 [1] CRAN (R 4.2.1)
 cli           3.3.0     2022-04-25 [1] CRAN (R 4.2.1)
 crayon        1.5.1     2022-03-26 [1] CRAN (R 4.2.1)
 curl          4.3.2     2021-06-23 [1] CRAN (R 4.2.1)
 data.table    1.14.2    2021-09-27 [1] CRAN (R 4.2.1)
 DBI           1.1.3     2022-06-18 [1] CRAN (R 4.2.1)
 devtools      2.4.3     2021-11-30 [1] CRAN (R 4.2.1)
 digest        0.6.29    2021-12-01 [1] CRAN (R 4.2.1)
 dplyr       * 1.0.9     2022-04-28 [1] CRAN (R 4.2.1)
 ellipsis      0.3.2     2021-04-29 [1] CRAN (R 4.2.1)
 evaluate      0.15      2022-02-18 [1] CRAN (R 4.2.1)
 fansi         1.0.3     2022-03-24 [1] CRAN (R 4.2.1)
 fastmap       1.1.0     2021-01-25 [1] CRAN (R 4.2.1)
 fs            1.5.2     2021-12-08 [1] CRAN (R 4.2.1)
 generics      0.1.2     2022-01-31 [1] CRAN (R 4.2.1)
 glue        * 1.6.2     2022-02-24 [1] CRAN (R 4.2.1)
 hms           1.1.1     2021-09-26 [1] CRAN (R 4.2.1)
 htmltools     0.5.2     2021-08-25 [1] CRAN (R 4.2.1)
 httpuv        1.6.5     2022-01-05 [1] CRAN (R 4.2.1)
 httr          1.4.3     2022-05-04 [1] CRAN (R 4.2.1)
 janitor       2.1.0     2021-01-05 [1] CRAN (R 4.2.1)
 jsonlite      1.8.0     2022-02-22 [1] CRAN (R 4.2.1)
 knitr         1.39      2022-04-26 [1] CRAN (R 4.2.1)
 later         1.3.0     2021-08-18 [1] CRAN (R 4.2.1)
 lifecycle     1.0.1     2021-09-24 [1] CRAN (R 4.2.1)
 lubridate     1.8.0     2021-10-07 [1] CRAN (R 4.2.1)
 magrittr      2.0.3     2022-03-30 [1] CRAN (R 4.2.1)
 memoise       2.0.1     2021-11-26 [1] CRAN (R 4.2.1)
 mime          0.12      2021-09-28 [1] CRAN (R 4.2.0)
 openssl       2.0.2     2022-05-24 [1] CRAN (R 4.2.1)
 pillar        1.7.0     2022-02-01 [1] CRAN (R 4.2.1)
 pkgbuild      1.3.1     2021-12-20 [1] CRAN (R 4.2.1)
 pkgconfig     2.0.3     2019-09-22 [1] CRAN (R 4.2.1)
 pkgload       1.3.0     2022-06-27 [1] CRAN (R 4.2.1)
 prettyunits   1.1.1     2020-01-24 [1] CRAN (R 4.2.1)
 processx      3.6.1     2022-06-17 [1] CRAN (R 4.2.1)
 promises      1.2.0.1   2021-02-11 [1] CRAN (R 4.2.1)
 ps            1.7.1     2022-06-18 [1] CRAN (R 4.2.1)
 purrr         0.3.4     2020-04-17 [1] CRAN (R 4.2.1)
 R6            2.5.1     2021-08-19 [1] CRAN (R 4.2.1)
 Rcpp          1.0.8.3   2022-03-17 [1] CRAN (R 4.2.1)
 readr         2.1.2     2022-01-30 [1] CRAN (R 4.2.1)
 remotes       2.4.2     2021-11-30 [1] CRAN (R 4.2.1)
 rlang         1.0.3     2022-06-27 [1] CRAN (R 4.2.1)
 rlist         0.4.6.2   2021-09-03 [1] CRAN (R 4.2.1)
 rmarkdown     2.14      2022-04-25 [1] CRAN (R 4.2.1)
 rstudioapi    0.13      2020-11-12 [1] CRAN (R 4.2.1)
 salesforcer * 1.0.1     2022-03-01 [1] CRAN (R 4.2.1)
 sessioninfo   1.2.2     2021-12-06 [1] CRAN (R 4.2.1)
 shiny         1.7.1     2021-10-02 [1] CRAN (R 4.2.1)
 snakecase     0.11.0    2019-05-25 [1] CRAN (R 4.2.1)
 stringi       1.7.6     2021-11-29 [1] CRAN (R 4.2.0)
 stringr       1.4.0     2019-02-10 [1] CRAN (R 4.2.1)
 tibble        3.1.7     2022-05-03 [1] CRAN (R 4.2.1)
 tidyselect    1.1.2     2022-02-21 [1] CRAN (R 4.2.1)
 tzdb          0.3.0     2022-03-28 [1] CRAN (R 4.2.1)
 usethis       2.1.6     2022-05-25 [1] CRAN (R 4.2.1)
 utf8          1.2.2     2021-07-24 [1] CRAN (R 4.2.1)
 vctrs         0.4.1     2022-04-13 [1] CRAN (R 4.2.1)
 vroom         1.5.7     2021-11-30 [1] CRAN (R 4.2.1)
 withr         2.5.0     2022-03-03 [1] CRAN (R 4.2.1)
 xfun          0.31      2022-05-10 [1] CRAN (R 4.2.1)
 XML           3.99-0.10 2022-06-09 [1] CRAN (R 4.2.0)
 xml2          1.3.3     2021-11-30 [1] CRAN (R 4.2.1)
 xtable        1.8-4     2019-04-21 [1] CRAN (R 4.2.1)
 yaml          2.3.5     2022-02-21 [1] CRAN (R 4.2.0)
 zip           2.2.0     2021-05-31 [1] CRAN (R 4.2.1)

Sep 13 '22 05:09 charithlw

As an aside, when I try that query without the line PKChunkingHeader = list('Sforce-Enable-PKChunking' = TRUE) in the call to sf_query, I get the following error message:

Error:
! Column name `result` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.
✖ These names are duplicated:
  * "result" at locations 1 and 2.

Not sure if that helps or confuses... 😆

Sep 13 '22 05:09 charithlw

Perhaps I should add (although it may be more appropriate to submit as a separate issue?) that when I try using the Bulk 2.0 API I get the following error message:

Error in rbindlist(l = l, fill = fill, idcol = idcol, ...) :                                                                                                                              
  Class attribute on column 7 of item 2 does not match with column 9 of item 1.

I actually get this error message pretty commonly when using the Bulk 2.0 option with sf_query, which is why I was originally using Bulk 1.0.

Not sure how to debug this...

Sep 13 '22 05:09 charithlw

Thanks @charithlw for flagging these issues! They seem to be 3 different, but all slightly related issues, so I'll try to address each in the order you've mentioned them.

Issue 1: `Error in if (grepl("/services/data/`...

PKChunked queries are a little bit unique in that they create one bulk batch that has a state "NotProcessed" that provides the chunking instructions and needs to be ignored, and then all of the chunks are created as separate batches. The existing code wasn't handling those batches properly and sending more than one a time to poll the result; hence, the issue about more than 1 URL. I've updated the underlying functions behind sf_query() to fix the issue in this commit. It would be great if you could install the package from the GitHub dev branch and let me know if it fixes the issue you were having:

# install.packages("remotes")
remotes::install_github("StevenMMortimer/salesforcer", ref = "dev")

Issue 2: `Error:! Column name` result `must not be duplicated.`

It's not obvious which function is generating the error. My guess is one of the as_tibble() calls because that will trigger a check for column name issues. I've updated all references to as_tibble() in the package code with the argument .name_repair = "unique" so that all instances have a name repair method specified and its the same across all. I'm not sure it will fix the issue, but probably a good fix to make regardless.

Issue 3: `Error in rbindlist...`

It looks like the {{data.table}} package is working on ways to silence/ignore errors where the binding of different lists into a single data.frame can ignore the attributes of the columns (Rdatatable/data.table#5446). My guess is that the query is pulling some values and coalescing a column to a date and another batch might have all NULLs for that column and it's returned as a boolean (which can happen). All that being said, I can't really dig into your specific issue and we have to wait until the {{data.table}} package sorts things out. In the meantime, have you tried running the query with the Bulk 2.0 and specifying the argument guess_types=FALSE? This will cast everything as a character and you can convert it after it has all been put together. This could be one workaround, but you'd have to try and see if it works.

Sep 17 '22 09:09 StevenMMortimer

Hi @StevenMMortimer so can confirm that Issue 3 was solved with the workaround that you suggested which is to use guess_types = FALSE. So happy about that! I'll test out the other 2 and get back to you shortly. :)

Sep 21 '22 00:09 charithlw

Error when trying to enable PKChunking

Issue description

Query

Verbose output

Session Info

Issue 1: Error in if (grepl("/services/data/...

Issue 2: Error:! Column name result must not be duplicated.

Issue 3: Error in rbindlist...

Issue 1: `Error in if (grepl("/services/data/`...

Issue 2: `Error:! Column name` result `must not be duplicated.`

Issue 3: `Error in rbindlist...`