Error when trying to enable PKChunking
Issue description
Hi, I'm running into an issue trying to enable PKChunking on a query I'm making to SF. I am referring to your answer on SO here but I keep getting the error message Error in if (grepl("/services/data/v[0-9]{2}.[0-9]{1}/jobs/ingest", url)) { : the condition has length > 1.
Do you have any idea what might be going wrong?
Have included the query, verbose output and session info. I'm new to salesforce and your package, so please let me know what other information might help you troubleshoot this issue.
Query
campaign_member_query_test <- glue(
"
SELECT Campaign.Business_Unit_Affiliation__c,
Campaign.Product__c,
CM_Contact_CSID__c,
Contact_Has_Test_Flag__c,
Event_Registration__c,
Event_Registration_Date__c,
Event_Amount_Raised__c,
RC_Registrant_Status__c,
RC_RegistrationDate__c
FROM CampaignMember
"
)
campaign_members <- sf_query(soql = campaign_member_query_test,
object_name = "CampaignMember",
api_type = "Bulk 1.0",
PKChunkingHeader = list(`Sforce-Enable-PKChunking` = TRUE),
interval_seconds = 10,
max_attempts = 400, verbose = TRUE)
Verbose output
--HTTP Request----------------
POST https://ncca-ltd.my.salesforce.com/services/async/54.0/job
--Headers---------------------
Accept: application/xml; Content-Type: application/xml; Sforce-Enable-PKChunking: TRUE; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--Body------------------------
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
<operation>query</operation>
<object>CampaignMember</object>
<concurrencyMode>Parallel</concurrencyMode>
<contentType>CSV</contentType>
</jobInfo>
--HTTP Request----------------
POST https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: text/csv; charset=UTF-8; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--Body------------------------
Uploaded TXT file: C:\Users\Charith\AppData\Local\Temp\RtmpuGaHBg\file5e7c3a9e62bd
Attempt #1
--HTTP Request----------------
GET https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch/7512x00000JkKEaAAN
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: ; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--HTTP Request----------------
GET https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: ; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
Attempt #1
Error in if (grepl("/services/data/v[0-9]{2}.[0-9]{1}/jobs/ingest", url)) { :
the condition has length > 1
Session Info
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.2.1 (2022-06-23 ucrt)
os Windows 10 x64 (build 22000)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2022-09-13
rstudio 2022.07.1+554 Spotted Wakerobin (desktop)
pandoc 2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
afp.data * 0.1.0 2022-09-07 [1] local
anytime 0.3.9 2020-08-27 [1] CRAN (R 4.2.1)
askpass 1.1 2019-01-13 [1] CRAN (R 4.2.1)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.1)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.2.0)
bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.1)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.1)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.1)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.2.1)
cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.1)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.1)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.1)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.2.1)
DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.1)
devtools 2.4.3 2021-11-30 [1] CRAN (R 4.2.1)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.1)
dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.1)
evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.1)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.1)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.1)
generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.1)
glue * 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.1)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.1)
httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.1)
httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.1)
janitor 2.1.0 2021-01-05 [1] CRAN (R 4.2.1)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.1)
knitr 1.39 2022-04-26 [1] CRAN (R 4.2.1)
later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.1)
lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
openssl 2.0.2 2022-05-24 [1] CRAN (R 4.2.1)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.1)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.2.1)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.1)
processx 3.6.1 2022-06-17 [1] CRAN (R 4.2.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.1)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.2.1)
readr 2.1.2 2022-01-30 [1] CRAN (R 4.2.1)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1)
rlang 1.0.3 2022-06-27 [1] CRAN (R 4.2.1)
rlist 0.4.6.2 2021-09-03 [1] CRAN (R 4.2.1)
rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.1)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.1)
salesforcer * 1.0.1 2022-03-01 [1] CRAN (R 4.2.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
shiny 1.7.1 2021-10-02 [1] CRAN (R 4.2.1)
snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.2.1)
stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.1)
tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.1)
tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.1)
tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.1)
usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.1)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.1)
vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.1)
vroom 1.5.7 2021-11-30 [1] CRAN (R 4.2.1)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
xfun 0.31 2022-05-10 [1] CRAN (R 4.2.1)
XML 3.99-0.10 2022-06-09 [1] CRAN (R 4.2.0)
xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1)
yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
zip 2.2.0 2021-05-31 [1] CRAN (R 4.2.1)
As an aside, when I try that query without the line PKChunkingHeader = list('Sforce-Enable-PKChunking' = TRUE) in the call to sf_query, I get the following error message:
Error:
! Column name `result` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.
✖ These names are duplicated:
* "result" at locations 1 and 2.
Not sure if that helps or confuses... 😆
Perhaps I should add (although it may be more appropriate to submit as a separate issue?) that when I try using the Bulk 2.0 API I get the following error message:
Error in rbindlist(l = l, fill = fill, idcol = idcol, ...) :
Class attribute on column 7 of item 2 does not match with column 9 of item 1.
I actually get this error message pretty commonly when using the Bulk 2.0 option with sf_query, which is why I was originally using Bulk 1.0.
Not sure how to debug this...
Thanks @charithlw for flagging these issues! They seem to be 3 different, but all slightly related issues, so I'll try to address each in the order you've mentioned them.
Issue 1: Error in if (grepl("/services/data/...
PKChunked queries are a little bit unique in that they create one bulk batch that has a state "NotProcessed" that provides the chunking instructions and needs to be ignored, and then all of the chunks are created as separate batches. The existing code wasn't handling those batches properly and sending more than one a time to poll the result; hence, the issue about more than 1 URL. I've updated the underlying functions behind sf_query() to fix the issue in this commit. It would be great if you could install the package from the GitHub dev branch and let me know if it fixes the issue you were having:
# install.packages("remotes")
remotes::install_github("StevenMMortimer/salesforcer", ref = "dev")
Issue 2: Error:! Column name result must not be duplicated.
It's not obvious which function is generating the error. My guess is one of the as_tibble() calls because that will trigger a check for column name issues. I've updated all references to as_tibble() in the package code with the argument .name_repair = "unique" so that all instances have a name repair method specified and its the same across all. I'm not sure it will fix the issue, but probably a good fix to make regardless.
Issue 3: Error in rbindlist...
It looks like the {{data.table}} package is working on ways to silence/ignore errors where the binding of different lists into a single data.frame can ignore the attributes of the columns (Rdatatable/data.table#5446). My guess is that the query is pulling some values and coalescing a column to a date and another batch might have all NULLs for that column and it's returned as a boolean (which can happen). All that being said, I can't really dig into your specific issue and we have to wait until the {{data.table}} package sorts things out. In the meantime, have you tried running the query with the Bulk 2.0 and specifying the argument guess_types=FALSE? This will cast everything as a character and you can convert it after it has all been put together. This could be one workaround, but you'd have to try and see if it works.
Hi @StevenMMortimer so can confirm that Issue 3 was solved with the workaround that you suggested which is to use guess_types = FALSE. So happy about that! I'll test out the other 2 and get back to you shortly. :)