data-api-builder [Enhancement]: Bulk Insert via REST

Abstract

Support the insertion of multiple items to a single table in a single POST request

Additional Details on Request Header and Body

To be Added

### Tasks
- [ ] https://github.com/Azure/data-api-builder/issues/94
- [ ] https://github.com/Azure/data-api-builder/issues/93
- [ ] https://github.com/Azure/data-api-builder/issues/92

Oct 16 '23 10:10 severussundar

Moving this to milestone 1.1 as this is planned post GA

Nov 16 '23 14:11 severussundar

What is it?

We're discussing inserting 20,000 rows or more in one operation. Given that the Data API is often the sole interaction point with the database in many cases, this significantly enhances the value of Data API builder.

Input validation would not be applied.

SQL’s Bulk Insert feature facilitates the efficient loading of large amounts of data from a file into a database table. This command is ideal for importing data from external files, such as CSV or text files, into a SQL Server database, making it well-suited for large-scale data migrations, ETL processes, or importing data into staging tables.

BULK INSERT dbo.YourTable
FROM 'C:\path\to\yourfile.csv'
WITH (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    FIRSTROW = 2
);

In this case, we would need to receive both the file (or its content) and header/metadata, including information such as batch size, row terminator, and whether the first row is the header.

Operation Syntax

For example, the following request demonstrates how this could work:

POST https://localhost/api/books?$bulk=true&$bulk-header=false&$bulk-batch=10000&$bulk-terminator=','

URL Parameters:

$bulk=true: Activates bulk operation mode for the request. This indicates that the incoming data will be processed in bulk, optimizing performance for large datasets.
$bulk-header=false: Indicates that the data being sent does not include column headers.
- If this were set to true, the server would expect the first row to contain the column names, similar to how CSV files often work.
$bulk-batch=10000: Specifies the size of each batch that will be processed. In this case, 10,000 records will be processed at once.
- Batching is important for improving performance and ensuring the server can handle large volumes of data without running into memory constraints.
$bulk-terminator=',': Defines the row delimiter used in the data file. In this case, rows are separated by commas (,), which is typical for CSV (Comma-Separated Values) files.
$bulk-type=csv: Specifies the format of the data being sent.
- csv indicates that the data being uploaded is in CSV format, which is a widely used format for bulk data imports. Other possible values could include JSON, XML, etc., depending on the system's capabilities.
- The server will use this to correctly interpret the structure and format of the incoming data.

Body:

The body of this request contains the bulk data to be inserted into the database. If the $bulk-type is set to csv, the body would contain CSV-formatted data.

Example CSV Content in Request Body:

123, "The Great Gatsby", "F. Scott Fitzgerald", 1925
124, "To Kill a Mockingbird", "Harper Lee", 1960
125, "1984", "George Orwell", 1949

Example CSV Content in Request Body (with headers):

id,title,author,published_year
123,"The Great Gatsby","F. Scott Fitzgerald",1925
124,"To Kill a Mockingbird","Harper Lee",1960
125,"1984","George Orwell",1949
126,"Moby Dick","Herman Melville",1851

Both PostgreSQL and MySQL offer similar features.

Example Configuration

{
  ...
  "entities": {
    "<entity-name>": {
      ...
      "permissions": [
        {
          ...
          "actions": [
            "create", 
            "read", 
            "update", 
            "delete", 
            "execute",
            "bulk-insert" // new
          ]
        }
      ]
    }
  }
}

Questions

The new permission next to create is BulkInsert. This must be explicitly granted because bulk inserts bypass certain constraints, making it critical to control access and prevent unauthorized users from performing this operation.
Is it okay to only have InsertMany? Yes. This is the only valid bulk operation.
Is it okay to only have REST support bulk operations? Yes. Bulk Insert is not part of the official GraphQL specification. The GraphQL spec primarily focuses on querying and mutating data through resolvers but does not include operations like bulk inserts or batch data uploads as standard functionality. We could circle-back to this later.

Related Issues to Close

#1806
#1805
#1804
#1783
#93
#92

Sep 06 '24 19:09 JerryNixon