RForcecom icon indicating copy to clipboard operation
RForcecom copied to clipboard

Feature Request: Bulk insert / update of records

Open 3vivekb opened this issue 9 years ago • 15 comments

So right now within the R Code we can update one record at a time. But if we run a for loop and attempt to update many records, we can run into the 1000-15000 request limit. Also it's kinda slow.

Can somebody implement the bulk insert / update?

3vivekb avatar Feb 02 '16 00:02 3vivekb

The Bulk API should have been included in the package starting at v0.8.0. If you type ?RForcecom and review the package index samples, you should see the following examples of how to use the Bulk API:

## BULK INSERT
 # create a sample data.frame of 1000 records
 n <- 1000
 data <- data.frame(Name=paste('New Record:', 1:n),
                    stringsAsFactors=FALSE)

 # run an insert job into the Account object
 job_info <- rforcecom.createBulkJob(session,
                                     operation='insert',
                                     object='Account')

 # split into batch sizes of 500 (2 batches for our 1000 row sample dataset)
 batches_info <- rforcecom.createBulkBatch(session,
                                           jobId=job_info$id,
                                           data,
                                           multiBatch = TRUE,
                                           batchSize=500)

 # check on status of each batch
 batches_status <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.checkBatchStatus(session,
                                                      jobId=x$jobId,
                                                      batchId=x$id)
                                                      })
 # get details on each batch
 batches_detail <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.getBatchDetails(session,
                                                     jobId=x$jobId,
                                                     batchId=x$id)
                                                     })
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


 ## BULK DELETE THE PRIOR INSERT

 # format the data
 batch_details_together <- plyr::ldply(batches_detail)
 delete_ids <- data.frame(id=batch_details_together[,"Id"],
                          stringsAsFactors=FALSE)

 job_info <- rforcecom.createBulkJob(session, operation='delete', object='Account')
 batches_info <- rforcecom.createBulkBatch(session,
                                           jobId=job_info$id,
                                           data=delete_ids)
 # check on status of each batch
 batches_status <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.checkBatchStatus(session,
                                                      jobId=x$jobId,
                                                      batchId=x$id)
                                                      })
 # get details on each batch
 batches_detail <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.getBatchDetails(session,
                                                     jobId=x$jobId,
                                                     batchId=x$id)
                                                     })
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


 ## BULK QUERY

 query <- "SELECT Id, Name FROM Account LIMIT 10"
 job_info <- rforcecom.createBulkJob(session, operation='query', object='Account')
 batch_query_info <- rforcecom.submitBulkQuery(session,
                                               jobId=job_info$id,
                                               query=query)

 batch_query_status <- rforcecom.checkBatchStatus(session,
                                                  jobId=batch_query_info$jobId,
                                                  batchId=batch_query_info$id)

 batch_query_details <- rforcecom.getBatchDetails(session,
                                                  jobId=batch_query_info$jobId,
                                                  batchId=batch_query_info$id)

 batch_query_recordset <- rforcecom.getBulkQueryResult(session,
                                                       jobId=batch_query_info$jobId,
                                                       batchId=batch_query_info$id,
                                                       resultId=batch_query_details$result)
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


 ## BULK INSERT ATTACHMENTS

 # prepare your .zip file and request.txt manifest before calling these functions
 file <- 'request.zip'
 job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
 batch_attachment_info <- rforcecom.insertBulkAttachments(session,
                                                          jobId=job_info$id,
                                                          file=file)
 batch_attachment_status <- rforcecom.checkBatchStatus(session,
                                                       jobId=batch_attachment_info$jobId,
                                                       batchId=batch_attachment_info$id)
 batch_attachment_details <- rforcecom.getBatchDetails(session,
                                                       jobId=batch_attachment_info$jobId,
                                                       batchId=batch_attachment_info$id)
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)


## End(Not run)

StevenMMortimer avatar Feb 02 '16 04:02 StevenMMortimer

How external ID should be set when creating bulk update/upsert? I.e. upset requires externalID to be set, but doesn't allow to set it:

job_info <- rforcecom.createBulkJob(session, operation="upsert", object="Contact") Error in rforcecom.createBulkJob(session, operation = "upsert", object = "Contact") : InvalidJob: External ID was blank for Contact. An External ID must be specified for upsert. job_info <- rforcecom.createBulkJob(session, operation="upsert", externalIdFieldName = "Member_ID__c", object="Contact") Error in rforcecom.createBulkJob(session, operation = "upsert", externalIdFieldName = "Member_ID__c", : unused argument (externalIdFieldName = "Member_ID__c")

ilcca avatar Feb 11 '16 12:02 ilcca

@ilcca Thanks for bringing up the issue. I've fixed and submitted a PR here: https://github.com/hiratake55/RForcecom/pull/35

The feature hasn't been merged yet, so if you really want to use it then you'll need to re-install RForcecom with the following command in R: devtools::install_github('ReportMort/RForcecom', ref='metadata-api')

Here is an example of how to use:

job_info <- rforcecom.createBulkJob(session,  operation='upsert', 
                                    externalIdFieldName='My_External_Id__c', 
                                    object='Account')

my_data <- data.frame(My_External_Id__c=c('11111','22222', '99999'),  
                      Name=c('Updated_Name1', 'Updated_Name2', 'Upserted_Record'), 
                      stringsAsFactors=FALSE)

batches_info <- rforcecom.createBulkBatch(session, 
                                          jobId=job_info$id, 
                                          data=my_data)

# gather the batches result as you would with other operations (insert/update/delete)

StevenMMortimer avatar Feb 11 '16 16:02 StevenMMortimer

Steve, thanks for fixing this so quick! I got your package and the upsert works well now. However, to use update should there be external id applied as well? And what is the key id now when updating?

ilcca avatar Feb 16 '16 17:02 ilcca

Updating does not require an external id. Simply include a column entitled "Id" in your data, set the bulk job to "update" and Salesforce will update the corresponding records based on that Id field

StevenMMortimer avatar Feb 16 '16 17:02 StevenMMortimer

So if I want to update rows at Contact, is the "Id" here the same one retrieved with SQL: "Select Id from Contact"? Hence to update rows I must first get the list of Ids to update. Am I right?

My idea was to use an External id to update rows at Contact.

ilcca avatar Feb 23 '16 17:02 ilcca

Yes. "Id" here means the unique ID generated by Salesforce that corresponds to the record.

If you want to update based on External Id, then this might be a good strategy:

  1. Run SOQL: "Select Id, My_External_Id__c from Contact"
  2. In R, join your update data with this SOQL recordset on "My_External_Id__c", so that the "Id" column gets transferred into your update data.
  3. Ensure no duplicates got created during your join.
  4. Select the columns you want to send over to Salesforce (making sure you include the "Id" column)
  5. Create update bulk job and send over the data.frame.

StevenMMortimer avatar Feb 23 '16 18:02 StevenMMortimer

Thanks for the strategy, it makes sense.

However, when operating with bigger data then fetching all Ids from a Salesforce table can require lots of memory within R and traffic between Salesforce and R as well. That's why I'm wondering if there is a chance to make the sql query to include only needed Ids i.e. "Select Id, My_External_Id__c from Contact WHERE My_External_Id__c in LIST" Here LIST means a list of External Id to be updated.

ilcca avatar Feb 24 '16 09:02 ilcca

@ReportMort Looking over this thread, is it possible to do two things: A) update fields of information on a record and simultaneously B)attach a text file to that record?

Btibert3 avatar Jun 22 '17 13:06 Btibert3

@Btibert3 You'll have to do it in 2 passes, you cannot do it simultaneously because bulk jobs can only target one object at a time. Attachments must go to the Attachment object with a specially formatted zip file to upload and I think you're hinting that the records you want to update are in a different object (e.g. Account), so you'll have to create a second bulk job to do that.

StevenMMortimer avatar Jun 22 '17 14:06 StevenMMortimer

@ReportMort Interesting, thanks for the quick reply. I am not as well versed in the API, but yes, we have a custom object that I want to add both facts to as well as attach the file. Because the record on the custom object would already exist prior to updating the fields and adding the file, does that matter. The part I am somewhat lost on is how I associate the attachment with the record of interest.

Btibert3 avatar Jun 22 '17 19:06 Btibert3

You'll need to create a zip file containing your attachments and a manifest file formatted as CSV that associates each document inside your zip file to a specific record id in Salesforce. You can see the example R code at the bottom of the package help, just type ?R`Forcecom-package` in your R console.

Here is the example code copy/pasted from there:

 ## BULK INSERT ATTACHMENTS
 
 # prepare your .zip file and request.txt manifest before calling these functions
 file <- 'request.zip'
 job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
 batch_attachment_info <- rforcecom.insertBulkAttachments(session, 
                                                          jobId=job_info$id, 
                                                          file=file)
 batch_attachment_status <- rforcecom.checkBatchStatus(session, 
                                                       jobId=batch_attachment_info$jobId, 
                                                       batchId=batch_attachment_info$id)
 batch_attachment_details <- rforcecom.getBatchDetails(session, 
                                                       jobId=batch_attachment_info$jobId, 
                                                       batchId=batch_attachment_info$id)
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

Here is the reference page on how to create your zip file of attachments with the appropriate manifest: https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/binary_intro.htm. Your zip file is kept locally and you should create a CSV request.txt file since that is the default format used by the RForcecom package when inserting into the Attachment object.

StevenMMortimer avatar Jun 22 '17 20:06 StevenMMortimer

You are the man, thanks for your help! Ok, seeing that, how am I am able to assign that attachment to a specific record on my custom object?

Btibert3 avatar Jun 22 '17 20:06 Btibert3

Your manifest.txt file will contain the Ids. The Ids are always unique, even across objects, so don't worry, the attachments will end up going to the right records on your custom object as long as you've specified their individual Ids properly.

I suggest, going through and updating all the records first, that way you know the Ids and can put those in your manifest.txt file.

StevenMMortimer avatar Jun 22 '17 21:06 StevenMMortimer

Ahhhh! Let me take a look. Much appreciated.

Btibert3 avatar Jun 22 '17 21:06 Btibert3