php-crud-api icon indicating copy to clipboard operation
php-crud-api copied to clipboard

Insert into join tables through a view

Open lcuis opened this issue 6 years ago • 23 comments

Solution to insert in two linked tables through one request

I found a solution to insert data in two joined tables in simple FK situation through a transaction with one request on php-crud-api but this required me to do some changes to the GenericDB and RecordController classes.

Limitations

Here are the limitations I know of:

  1. This relies on the creation of a simple view where the two tables are joined in a simple syntax.
  2. The two tables can't have columns with the same name except for their ids.
  3. This probably only works for MySQL/MariaDB.
  4. It requires the two tables to be in the same database.

Drawbacks of separate requests to insert in linked tables

I needed such a solution because creating records in two linked tables through two requests leads to the following drawbacks IMHO:

  1. I believe this is slower and consumes more resources.
  2. It gives more risks for orphan records.
  3. It gives more opportunities for hackers to create a mess in the two separate tables.

Here are the changes I made to php-crud-api for the solution to work

In RecordController.create(), I had to replace:

            if ($this->service->getType($table) != 'table') {
                return $this->responder->error(ErrorCode::OPERATION_NOT_SUPPORTED, __FUNCTION__);
            }

By this:

            if ($this->service->getType($table) != 'table'&&$this->service->getType($table) != 'view') {
                return $this->responder->error(ErrorCode::OPERATION_NOT_SUPPORTED, __FUNCTION__);
            }

I added those two functions to the GenericDB class:

        public function insertInView(ReflectedTable $table, array $columnValues)
        {
            $sql="select `information_schema`.`VIEWS`.`VIEW_DEFINITION` from `INFORMATION_SCHEMA`.`VIEWS` where table_name = '".$table->getName()."';";
            $stmt = $this->query($sql, []);
            $viewSql=$stmt->fetchColumn(0);
            $matches = array();
            $tablesAndCols=array();
            if(preg_match('/(?<=select )(.*)(?= from )/i', $viewSql, $matches)) {
                $selstr=$matches[0];
                $matches=array();
                if (preg_match_all('/(?<=^|,)([^,]*)(?= as)/i', $selstr, $matches)) {
                    foreach($matches[0] as $m){
                        $tempMatches=array();
                        if(preg_match('/(?<=^|.)([^.]*).([^.]*)(?=$)/i',$m,$tempMatches)){
                            $tab=str_replace("`","",$tempMatches[1]);
                            $col=str_replace("`","",$tempMatches[2]);
                            $tablesAndCols[$tab][]=$col;
                        }
                    }
                }
            }
            if(count($tablesAndCols)!=2){
                return(false);
            }
            $sql="select table_name,column_name,referenced_table_name,referenced_column_name from INFORMATION_SCHEMA.KEY_COLUMN_USAGE where (lower(referenced_table_name) = lower('".array_keys($tablesAndCols)[0]."') and lower(table_name) = lower('".array_keys($tablesAndCols)[1]."')) or (lower(referenced_table_name) = lower('".array_keys($tablesAndCols)[1]."') and lower(table_name) = lower('".array_keys($tablesAndCols)[0]."'))";
            $stmt = $this->query($sql, []);
            $foreignKeyData=$stmt->fetch();
            $primaryTable=$foreignKeyData["referenced_table_name"];
            $foreignTable=$foreignKeyData["table_name"];
            $foreignColumn=$foreignKeyData["column_name"];
            $insertPrimarySql=$this->getInsertSql($primaryTable,$tablesAndCols,$columnValues,$table);
            $insertForeignSql=$this->getInsertSql($foreignTable,$tablesAndCols,$columnValues,$table,$foreignColumn);
            try{
                if($this->pdo->beginTransaction()){
                    $stmt = $this->query($insertPrimarySql, []);
                    $stmt->fetch();
                    $stmt = $this->query($insertForeignSql, []);
                    $stmt->fetch();
                    $this->pdo->commit();
                }
                else{
                    return(false);
                }
            }catch (Exception $e) {
                $result=$this->pdo->rollBack();
                return(false);
            }
            return(true);
        }
        
        public function getInsertSql($tableName,$tablesAndCols,$columnValues,$table,$foreignTableColumn="")
        {
            $stringTypes=["varchar","char","binary","varbinary","blob","text","enum","set"];
            $insertSql="insert into ".$tableName." (";
            $insertSqlTail=") values (";
            if(strlen($foreignTableColumn)>0){
                $insertSql.=$foreignTableColumn;
                $insertSqlTail.="LAST_INSERT_ID()";
                if(count($tablesAndCols[$tableName])>0){
                    $insertSql.=",";
                    $insertSqlTail.=",";
                }
            }
            $i=0;
            foreach($tablesAndCols[$tableName] as $colName){
                $i++;
                $insertSql.=" ".$colName;
                $insertSqlTail.=" ";
                if(array_key_exists($colName,$columnValues)){
                    $col = $table->getColumn($colName);
                    $colType=$col->getType();
                    $quote=false;
                    if(array_search(strtolower($colType),$stringTypes)!==false){
                        $quote=true;
                    }
                    $insertSqlTail.=($quote?"'":"").$columnValues[$colName].($quote?"'":"");
                }
                else{
                    $insertSqlTail.="null";
                }
                if($i<count($tablesAndCols[$tableName])){
                    $insertSql.=",";
                    $insertSqlTail.=",";
                }
            }
            $insertSql.=$insertSqlTail.")";
            return($insertSql);
        }

I changed the GenericDB.createSingle() function into this:

        public function createSingle(ReflectedTable $table, array $columnValues) /*: ?String*/
        {
            $created=false;
            if($table->getType() == 'view'){
                $created=$this->insertInView($table,$columnValues);
            }
            if(!$created){
                $this->converter->convertColumnValues($table, $columnValues);
                $insertColumns = $this->columns->getInsert($table, $columnValues);
                $tableName = $table->getName();
                if(!is_null($table->getPk()))
                    $pkName = $table->getPk()->getName();
                $parameters = array_values($columnValues);
                $sql = 'INSERT INTO "' . $tableName . '" ' . $insertColumns;
                $stmt = $this->query($sql, $parameters);
                // return primary key value if specified in the input
                if (!is_null($table->getPk())&&isset($columnValues[$pkName])) {
                    return $columnValues[$pkName];
                }
            }
            // work around missing "returning" or "output" in mysql
            switch ($this->driver) {
                case 'mysql':
                    $stmt = $this->query('SELECT LAST_INSERT_ID()', []);
                    break;
            }
            $pkValue = $stmt->fetchColumn(0);
            if ($this->driver == 'sqlsrv' && $table->getPk()->getType() == 'bigint') {
                return (int) $pkValue;
            }
            return $pkValue;
        }

Here is an example of how this works

I have tables A and B with view VW_CRUD_A_B as:

CREATE TABLE `a` (
  `id` int(11) NOT NULL,
  `a_name` varchar(20) NOT NULL
);

CREATE TABLE `b` (
  `id` int(11) NOT NULL,
  `a_id` int(11) NOT NULL,
  `b_name` varchar(20) NOT NULL
);

CREATE VIEW vw_crud_a_b AS select `a`.`a_name`,`b`.`b_name` from (`a` join `b` on(`b`.`a_id` = `a`.`id`)) ;

ALTER TABLE `a`
  ADD PRIMARY KEY (`id`);

ALTER TABLE `b`
  ADD PRIMARY KEY (`id`),
  ADD KEY `b_a_id` (`a_id`);

ALTER TABLE `a`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE `b`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

ALTER TABLE `b`
  ADD CONSTRAINT `b_a_id` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;

When I POST on /records/vw_crud_a_b with the following, I get the records shown below in the database:

{
  "a_name":"name for a",
  "b_name":"name for b"
}
TABLE A:
ID: 1
A_NAME: name for a
TABLE B:
ID: 1
A_ID: 1
B_NAME: name for b

While I realize that there are many limitations and there are probably better ways to do the same thing, this does the job for me so far. The main risk for me using this solution is that php-crud-api will evolve and the changes I made may not work anymore.

Do you think there could be a way for me to improve this solution enough to have something similar implemented into php-crud-api ?

What would you recommend?

Thank you so much for php-crud-api , this is really useful to me!

lcuis avatar Oct 30 '19 04:10 lcuis

I will try here to decompose and clarify my question into several ones to make it more friendly. Sorry if it was too cryptic or not well explained. Please let me know if there is anything I can clarify further.

  1. Could it be interesting and acceptable for the authors of php-crud-api to have a solution allowing the insert into two joined tables through one API request to implement it in php-crud-api?

  2. If so, I presume that the limitation to MySQL/MariaDB is not acceptable. Are any other known limitations as listed in the original question or any other limitation that I didn't list not acceptable? For example, is the fact that a view is needed acceptable or should I try to find another way? Same question for the fact that the columns of the two tables cannot have the same name except for the primary keys?

  3. I added two functions in the original question which probably do things that existing php-crud-api functions do better. I will try to list the operations that those function perform to know if there could be a match in existing functions:

//$table here really is the view that joins the two tables
insertInView($table,$columnValues):
  A) Create a two dimensional array $tablesAndCols with the two tables names in keys
    and their column names in values.
  B) Find out which table holds the primary key as $primaryTable.
    Which table holds the foreign key as $foreignTable and the column where it is
    stored as $foreignColumn.
  C) Build the insert query for each of the two tables through a call to the function
     getInsertSql($foreignTable,$tablesAndCols,$columnValues,$table,$foreignColumn)
  D) Execute the two insert queries in a transaction.

//example with foreign table more complete than the table holding the primary key.
getInsertSql($foreignTable,$tablesAndCols,$columnValues,$table,$foreignColumn):
  E) Prepare the insert SQL string.
  F) For the foreign table, add the LAST_INSERT_ID() for the foreign column.
  G) For each column matching the API request, add the column and the request value.
  H) If value is of string type, add quotes around.

lcuis avatar Nov 03 '19 09:11 lcuis

Thank you for writing this extensive explanation.

Here are the limitations I know of:

  • This relies on the creation of a simple view where the two tables are joined in a simple syntax.
  • The two tables can't have columns with the same name except for their ids.
  • This probably only works for MySQL/MariaDB.
  • It requires the two tables to be in the same danyatabase.

Point 3 would be a problem, but I'm not sure that that is true.

creating records in two linked tables through two requests leads to the following drawbacks

  • I believe this is slower and consumes more resources.
  • It gives more risks for orphan records.
  • It gives more opportunities for hackers to create a mess in the two separate tables.

Point 1 is obviously true, but I would not value that too high. Doing SQL queries over the web is extremely slow anyway. I can see what you mean with point 2 here. Not using transactions can lead to partially executed inserts. This may be a real problem. Point 3 seems a bit unlikely to me, but maybe you have some sort of attack in mind that I am not aware of.

I'm not sure that I follow the use case though: If your inserts are very important I wouldn't let a client-side application execute them. And if you use a server side application to do the inserts, then a normal SQL connection is a more reliable option.

If I were to implement something for your use case (avoiding partial inserts) then it would be allowing multiple requests to be wrapped in a transaction. This is already applied in batch operations.

I'll leave this request open so that others can vote and/or give their input.

mevdschee avatar Nov 03 '19 22:11 mevdschee

Thanks a lot @mevdschee for your reply!

Here are the limitations I know of: ... 3. This probably only works for MySQL/MariaDB. ...

Point 3 would be a problem, but I'm not sure that that is true.

I meant that the solution I presented in the original post probably only works for MySQL/MariaDB but maybe this can work with other database engines too. I didn't realize for example that information_schema was also implemented for MS SQL Server and PostgreSQL. It seems that LAST_INSERT_ID() is only available in MySQL/MariaDB but alternatives are available for PostgreSQL and MS SQL Server.

creating records in two linked tables through two requests leads to the following drawbacks

  1. I believe this is slower and consumes more resources.
  2. It gives more risks for orphan records.
  3. It gives more opportunities for hackers to create a mess in the two separate tables.
  1. Point 1 is obviously true, but I would not value that too high. Doing SQL queries over the web is extremely slow anyway.
  1. Indeed, this should not be the most important reason to seek for a solution.
  1. I can see what you mean with point 2 here. Not using transactions can lead to partially executed inserts. This may be a real problem.
  1. This is the most relevant point for me also.
  1. Point 3 seems a bit unlikely to me, but maybe you have some sort of attack in mind that I am not aware of.
  1. Yes, it is hard for me to find a concrete example here. It is more a general rule I try to apply. If a user can only create a record in table A if there is a record B created at the same time, it is just one more constraint that gives more chances for consistency of the data. If I had to find an example still, this would be a client application that breaks if there are orphan records in table A. Though, that would likely be easy to test and fix. You are right, the second point really is the problem important to me.

I'm not sure that I follow the use case though: If your inserts are very important I wouldn't let a client-side application execute them. And if you use a server side application to do the inserts, then a normal SQL connection is a more reliable option.

My inserts are important but I value the generic approach of php-crud-api . Implementing a special case in a separate php file to handle inserts in joined tables with a normal SQL connection would break a bit my efforts to use that generic approach.

If I were to implement something for your use case (avoiding partial inserts) then it would be allowing multiple requests to be wrapped in a transaction. This is already applied in batch operations.

Indeed, if I could send a batch with the insert in table A and the insert in table B with the id created just before in table A as the link, that would already be good enough for my case. I could then try to find a solution to make sure there is no insert in table A without an insert in table B through maybe a authorization.tableHandler rule or through a database constraint.

I tried to POST to /records/a,b but this returns a 404 error:

{
    "code": 1001,
    "message": "Table 'a,b' not found"
}

The README.md mentions the availability of batch create but it seems it is for only one table at a time, not for separate or joined tables. Is that correct? Or am I doing it wrong? Should I use a join in JSON? I am not sure how to do that.

I'll leave this request open so that others can vote and/or give their input.

Thanks!

lcuis avatar Nov 04 '19 03:11 lcuis

alternatives are available for PostgreSQL and MS SQL Server.

Yes

Indeed, if I could send a batch with the insert in table A and the insert in table B with the id created just before in table A as the link, that would already be good enough for my case.

It should work the other way around. First create the parent, then all the children at once using batch with the inserted id as a foreign key.

The README.md mentions the availability of batch create but it seems it is for only one table at a time, not for separate or joined tables. Is that correct?

Yes it is

mevdschee avatar Nov 04 '19 06:11 mevdschee

Possible duplicate of #138 and #29

mevdschee avatar Nov 04 '19 06:11 mevdschee

Thank you very much for your responses @mevdschee ! I would ideally like to continue creating in the two join tables with only one call to php-crud-api . I like the idea mentioned by @meesvandongen to the issue 138 that you linked to create in two join tables through a call like this:

[ {'table1': { 'field': 'value' }}, {'table2': { 'field': 'value' }} ]

However, this seems more complex for me to implement than the solution I already have.

Do you think I could improve the view approach enough to be acceptable as an enhancement to php-crud-api if I make it work on MySQL/MariaDB, PostgreSQL and MS SQL Server? My intention would be to reuse as much as possible existing php-crud-api code which I didn't explore enough yet.

lcuis avatar Nov 04 '19 07:11 lcuis

There is a ReflectedTable->getFksTo function that returns the foreign keys from a table to another one.

Also, ColumnsBuilder->getInsert conveniently takes care of the writing of the insert query.

Maybe those functions and others can help me build a better solution for transactions than the one based on views. One that would take this:

POST /transactions

[ 
   { 
      "table_a":[ 
         { 
            "name":"name 1",
            "value":"value 1"
         },
         { 
            "name":"name 2",
            "value":"value 2"
         },
         { 
            "name":"name 3",
            "value":"value 3"
         }
      ],
      "table_b":[ 
         { 
            "name":"name 1-3",
            "value":"value 1-3"
         }
      ],
      "table_c":[ 
         { 
            "name":"name 1",
            "value":"value 1"
         },
         { 
            "name":"name 2",
            "value":"value 2"
         },
         { 
            "name":"name 3",
            "value":"value 3"
         }
      ],
      "table_d":[ 
         { 
            "name":"name 1-3",
            "value":"value 1-3"
         }
      ]
   },
   { 
      "table_e":[ 
         { 
            "name":"name 4",
            "value":"value 4"
         },
         { 
            "name":"name 5",
            "value":"value 5"
         },
         { 
            "name":"name 6",
            "value":"value 6"
         }
      ],
      "table_f":[ 
         { 
            "name":"name 4",
            "value":"value 4"
         },
         { 
            "name":"name 5",
            "value":"value 5"
         },
         { 
            "name":"name 6",
            "value":"value 6"
         }
      ]
   }
]

Where the tables are defined as follows:

CREATE TABLE table_a (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL
);

CREATE TABLE table_b (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL
);

CREATE TABLE table_d (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  b_id int(11) NOT NULL,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL,
  FOREIGN KEY (b_id) REFERENCES table_b (id)
);

CREATE TABLE table_c (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  a_id int(11) NOT NULL,
  b_id int(11) NOT NULL,
  d_id int(11) NOT NULL,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL,
  FOREIGN KEY (a_id) REFERENCES table_a (id),
  FOREIGN KEY (b_id) REFERENCES table_b (id),
  FOREIGN KEY (d_id) REFERENCES table_d (id)
);

CREATE TABLE table_e (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL
);

CREATE TABLE table_f (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  e_id int(11) NOT NULL,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL,
  FOREIGN KEY (e_id) REFERENCES table_e (id)
);

Each item of the root array is a transaction. All the transaction tables are inspected for their foreign keys with each other. The number of rows per table per transaction can either be one or n. The value of n being the maximum number of rows for any table within the transaction. New error code proposal: 1020: 409 Conflict: Number of rows not 1 nor transaction maximum. If a table in a transaction has one row and not n, it cannot have a foreign key to a table with n rows for that transaction. New error code proposal: 1021: 409 Conflict: Foreign key to table with more rows in transaction. However, it can have a foreign key to tables with one row within that transaction. If a table in a transaction has n rows and not one, it can have a foreign key to any table within that transaction. There can be only one foreign key between one table of the transaction and another one. But this rule already exists if I am not mistaken. The transaction must start with the tables with no foreign keys and continue with those for which the foreign ids are defined.

The example above would lead to the following results:

table_a:
id:1, name:"name 1", value:"value 1"
id:2, name:"name 2", value:"value 2"
id:3, name:"name 3", value:"value 3"

table_b:
id:1, name:"name 1-3", value:"value 1-3"

table_c:
id:1, a_id:1, b_id:1, d_id:1, name:"name 1", value:"value 1"
id:2, a_id:1, b_id:2, d_id:1, name:"name 2", value:"value 2"
id:3, a_id:1, b_id:3, d_id:1, name:"name 3", value:"value 3"

table_d:
id:1, b_id:1, name:"name 1-3", value:"value 1-3"

table_e:
id:1, name:"name 4", value:"value 4"
id:2, name:"name 5", value:"value 5"
id:3, name:"name 6", value:"value 6"

table_f:
id:1, e_id:1, name:"name 4", value:"value 4"
id:2, e_id:2, name:"name 5", value:"value 5"
id:3, e_id:3, name:"name 6", value:"value 6"

First transaction above can start with tables table_a and table_b in any order. Then, proceed with table_d and then only insert in table_c.

I thought about this because php-crud-api is a generic solution and this is why it is great. Thus, it deserves a generic way to handle transactions.

Would this be a good approach?

lcuis avatar Nov 05 '19 05:11 lcuis

Would this be a good approach?

Yes, it seems good, I think I would represent them as a tree, not as a series of records.

Anyway, this only covers inserts. How about updates?

mevdschee avatar Nov 05 '19 06:11 mevdschee

Great, thanks.

On the create, I would like to propose another error message: 1022: 409 Conflict: Circular link in transaction. This is to avoid the impossible transaction where T_G -> T_H -> T_I -> T_G .

Also, ideally, the webservice should return the total number of created rows.

When you talk about trees, you mean the connections modelling in the PHP code, right? I also had a tree in mind. However, we may have situations where a table could have more than one parent. With the example above, if we consider the first four tables as a directed graph, we have this:

TABLE_A       TABLE_B----¬
    |            |       |
    |         TABLE_D    |
    |            |       |
    \------------^-------^--------TABLE_C

This is not a tree I believe but this can still be treated as a transaction.

Yes, you are right, who says "generic", "transaction" and "insert" might say "update" and even "delete". I see two options there:

  1. Generic but complex where all (create, update, delete) can be posted as transactions together.
  2. Less generic but rest in the spirit where to create, we POST, to update, we PUT and to delete, we DELETE.

I don't feel confident developing something like option 1). This is a lot more complex than the current view based solution IMHO. What do you think?

lcuis avatar Nov 05 '19 09:11 lcuis

If I try to extend a bit option 2), the update can look like this:

PUT /transactions

[ 
   { 
      "table_a":[ 
         { 
            "id":1,
            "name":"name one"
         },
         { 
            "id":2,
            "value":"value two"
         },
         { 
            "id":3,
            "name":"name three",
            "value":"value three"
         }
      ],
      "table_b":[ 
         { 
            "id":1,
            "name":"name one to three",
            "value":"value 1-3"
         }
      ]
   },
   { 
      "table_e":[ 
         { 
            "id":1,
            "name":"name four",
            "value":"value 4"
         },
         { 
            "id":2,
            "name":"name five",
            "value":"value five"
         }
      ],
      "table_f":[ 
         { 
            "id":1,
            "name":"name 4",
            "value":"value four"
         }
      ]
   }
]

Here, the SQL WHERE clause is defined by the only primary key of the table defined in the request. If there is more than one primary key defined in the request, does this usually lead to a 1010: 409 Conflict: Data integrity violation error?

Same mechanism and question for the delete example below:

DELETE /transactions

[ 
   { 
      "table_a":[ 
         { 
            "id":1
         },
         { 
            "id":2
         },
         { 
            "id":3
         }
      ],
      "table_b":[ 
         { 
            "id":1
         }
      ]
   },
   { 
      "table_e":[ 
         { 
            "id":1
         },
         { 
            "id":2
         }
      ],
      "table_f":[ 
         { 
            "id":1
         }
      ]
   }
]

lcuis avatar Nov 05 '19 13:11 lcuis

I have seen trees in the php-crud-api code. Are we talking about the same thing?

lcuis avatar Nov 05 '19 14:11 lcuis

However, we may have situations where a table could have more than one parent.

IMHO, these situations are not very common.

to update, we PUT

Yes, agree, but then.. when you update a "post" with "comments" and you post 2 comments, while it currently has 3, what do you do? In a document store it would replace all children, but how should we handle it?

This, and other questions, is why this functionality is not implemented yet.

mevdschee avatar Nov 06 '19 06:11 mevdschee

Thanks!

However, we may have situations where a table could have more than one parent.

IMHO, these situations are not very common.

Yes, I see by the code that this is one assumption for php-crud-api. I understand. Fine by me. Trees are simpler.

to update, we PUT

Yes, agree, but then.. when you update a "post" with "comments" and you post 2 comments, while it currently has 3, what do you do? In a document store it would replace all children, but how should we handle it?

This, and other questions, is why this functionality is not implemented yet.

So, you would prefer the generic option 1) which is a lot more complex to implement. If one intention is to avoid orphan records, I believe that the implementation of the POST part of option 2) would already be a good starting point. Still, I understand that you want the best for your solution. Please let me know if I can help in any way for that.

lcuis avatar Nov 06 '19 08:11 lcuis

this is one assumption for php-crud-api.

Yes, representation as a tree is (which still allows cycles to exist)

So, you would prefer the generic option 1) which is a lot more complex to implement.

The problem is that it is not very clear how option 1 should work. The "belongsTo" relations can be detected from the data, but the (potentially multiple) "hasMany" and "habtm" relationships need to be specified in the "join" parameter in some way.

mevdschee avatar Nov 06 '19 08:11 mevdschee

this is one assumption for php-crud-api.

Yes, representation as a tree is (which still allows cycles to exist)

Cycles can exist outside the tree or inside the tree?

So, you would prefer the generic option 1) which is a lot more complex to implement.

The problem is that it is not very clear how option 1 should work. The "belongsTo" relations can be detected from the data,

Just to make sure I understand correctly. This can be handled without specifying the joins:

SQL:

CREATE TABLE table_e (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL
);

CREATE TABLE table_f (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  e_id int(11) NOT NULL,
  name varchar(100) NOT NULL,
  value varchar(100) NOT NULL,
  FOREIGN KEY (e_id) REFERENCES table_e (id)
);
POST /transactions

[
   {
      "inserts":[
         {
            "table_e.name":"name 4",
            "table_e.value":"value 4"
         },
         {
            "table_e.name":"name 5",
            "table_e.value":"value 5"
         },
         {
            "table_e.name":"name 6",
            "table_e.value":"value 6"
         },
         {
            "table_f.name":"name 4",
            "table_f.value":"value 4"
         },
         {
            "table_f.name":"name 5",
            "table_f.value":"value 5"
         },
         {
            "table_f.name":"name 6",
            "table_f.value":"value 6"
         }
      ]
   }
]

Outcome:

table_e:
id:1, name:"name 4", value:"value 4"
id:2, name:"name 5", value:"value 5"
id:3, name:"name 6", value:"value 6"

table_f:
id:1, e_id:1, name:"name 4", value:"value 4"
id:2, e_id:2, name:"name 5", value:"value 5"
id:3, e_id:3, name:"name 6", value:"value 6"

but the (potentially multiple) "hasMany" and "habtm" relationships need to be specified in the "join" parameter in some way.

Also to make sure I understand, this would require the specification of the join:

POST /transactions

[
   {
      "inserts":[
         {
            "table_e.name":"name 4-6",
            "table_e.value":"value 4-6"
         },
         {
            "table_f.name":"name 4",
            "table_f.value":"value 4"
         },
         {
            "table_f.name":"name 5",
            "table_f.value":"value 5"
         },
         {
            "table_f.name":"name 6",
            "table_f.value":"value 6"
         }
      ]
   }
]

Outcome:

table_e:
id:1, name:"name 4-6", value:"value 4-6"

table_f:
id:1, e_id:1, name:"name 4", value:"value 4"
id:2, e_id:1, name:"name 5", value:"value 5"
id:3, e_id:1, name:"name 6", value:"value 6"

One way to handle this specific case would be to enforce the following rules if I am not mistaken:

The number of rows per table per transaction can either be one or n. The value of n being the maximum number of rows for any table within the transaction. New error code proposal: 1020: 409 Conflict: Number of rows not 1 nor transaction maximum. If a table in a transaction has one row and not n, it cannot have a foreign key to a table with n rows for that transaction. New error code proposal: 1021: 409 Conflict: Foreign key to table with more rows in transaction. However, it can have a foreign key to tables with one row within that transaction. If a table in a transaction has n rows and not one, it can have a foreign key to any table within that transaction. (On the create, I would like to propose another error message:) 1022: 409 Conflict: Circular link in transaction. This is to avoid the impossible transaction where T_G -> T_H -> T_I -> T_G .

This way, I believe we can guess the joins from the number of records in the transaction for a given table and the foreign keys.

If this case is doable without specifying joins for you, if that's ok for you, we can discuss more complex cases. Otherwise, we can still discuss this case, as you wish.

lcuis avatar Nov 06 '19 12:11 lcuis

Maybe it would help you to understand this matter better with a real example. Say a post that has an author and an editor and multiple comments that have authors as well. Authors and editors are from the "users" table. This means that when we insert a "post" with some "comments" and some "users" it is totally unclear what to do.

mevdschee avatar Nov 06 '19 14:11 mevdschee

Here is my understanding of the database:

SQL

CREATE TABLE users (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(100) NOT NULL
);

CREATE TABLE posts (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  users_id int(11) NOT NULL,
  subject varchar(100) NOT NULL,
  text varchar(100) NOT NULL,
  FOREIGN KEY (users_id) REFERENCES users (id)
);

CREATE TABLE post_editions (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  users_id int(11) NOT NULL,
  posts_id int(11) NOT NULL,
  FOREIGN KEY (users_id) REFERENCES users (id),
  FOREIGN KEY (posts_id) REFERENCES posts (id)
);

CREATE TABLE comments (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  users_id int(11) NOT NULL,
  posts_id int(11) NOT NULL,
  text varchar(100) NOT NULL,
  FOREIGN KEY (users_id) REFERENCES users (id),
  FOREIGN KEY (posts_id) REFERENCES posts (id)
);

CREATE TABLE comment_editions (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  users_id int(11) NOT NULL,
  comments_id int(11) NOT NULL,
  FOREIGN KEY (users_id) REFERENCES users (id),
  FOREIGN KEY (comments_id) REFERENCES posts (id)
);

Let's create the users (not really likely to happen as a transaction though):

POST /transactions

[
   {
      "inserts":[
         {
            "users.name":"first user"
         },
         {
            "users.name":"second user"
         },
         {
            "users.name":"third user"
         }
      ]
   }
]

Outcome:

users:
id:1, name:"first user"
id:2, name:"second user"
id:3, name:"third user"

Let first user create a post:

POST /records/posts

{
   "users_id":1,
   "subject":"first post subject",
   "text":"first post text"
}

Outcome:

posts:
id:1, subject:"first post subject","text":"first post text"

Now let's say the second user edits the first post:

POST /transactions

[
   {
      "updates":[
         {
            "posts.id":1,
            "posts.subject":"my first post subject",
            "posts.text":"my first post text"
         }
      ],
      "inserts":[
         {
            "post_editions.users_id":2
         }
      ]
   }
]

Outcome:

posts:
id:1, subject:"my first post subject","text":"my first post text"

post_editions:
id:1, users_id:2, posts_id:1

Let the third user add a comment to the first post:

POST /records/comments

{
   "users_id":3,
   "posts_id":1,
   "text":"first comment text"
}

Outcome:

comments:
id:1, users_id:3, posts_id:1, text:"first comment text"

Let the second user edit the first comment:

POST /transactions

[
   {
      "updates":[
         {
            "comments.id":1,
            "comments.text":"my first comment text"
         }
      ],
      "inserts":[
         {
            "comments_editions.users_id":2
         }
      ]
   }
]

Outcome:

comments:
id:1, "text":"my first comment text"

comment_editions:
id:1, users_id:2, comments_id:1

I see all of the above feasible based on primary keys as what you called "belongsTo" relations.

Now, the problem probably arises when we want to insert a post the editions of the post, a comment and the editions of the comment all at the same time. Let's try that.

[
   {
      "inserts":[
         {
            "posts.users_id":1,
            "posts.subject":"second post subject",
            "posts.text":"second post text"
         },
         {
            "post_editions.users_id":2
         },
         {
            "post_editions.users_id":3
         },
         {
            "comments.users_id":3,
            "comments.text":"second post comment text"
         },
         {
            "comment_editions.users_id":1
         },
         {
            "comment_editions.users_id":2
         }
      ]
   }
]

Outcome:

posts:
id:2, users_id:1, subject:"second post subject", text:"second post text"

post_editions:
id:2, users_id:2, posts_id:2
id:3, users_id:3, posts_id:2

comments:
id:2, users_id:3, posts_id:2, text:"second post comment text"

comment_editions:
id:2, users_id:1, comments_id:2
id:3, users_id:2, comments_id:2

I believe that this is doable as well because it is again a simple "belongsTo" connection situation.

It becomes a problem I understand when we need to insert for example at the same time two posts and 1, 3 or more post editions (2 would not be a problem as we could match the stack items I believe: 1st post goes with 1st post edition; 2nd post goes with 2nd post edition):

POST /transactions

(should not work!)

[
   {
      "inserts":[
         {
            "posts.users_id":3,
            "posts.subject":"third post subject",
            "posts.text":"third post text"
         },
         {
            "posts.users_id":2,
            "posts.subject":"fourth post subject",
            "posts.text":"fourth post text"
         },
         {
            "post_editions.users_id":1
         }
      ]
   }
]

Yes, in that case, it would be difficult/dangerous to guess whether the post_editions entry refers to the first inserted post or the second one. This would break the rule for proposed error 1021: 409 Conflict: Foreign key to table with more rows in transaction. Solution would be for developer to split the transactions into two transactions.

Does this make sense so far?

lcuis avatar Nov 06 '19 17:11 lcuis

The problem with your approach is much more trivial than that. If I want to add a post with a user and there are two relationships between post and user (say editor and author) then how would I know what to do?

mevdschee avatar Nov 06 '19 18:11 mevdschee

I think that if we implement something like this, then it should act as a document store with nested entities. I understand how to insert that (that is trivial). Updating would delete and insert all child-entities (in case they are mentioned). That would be consistent behavior. This is the only feasible implementation I see (until now). But I am open to change my mind :-)

NB: We need a join path in the URL to know where the entities come from/need to go to.

mevdschee avatar Nov 06 '19 18:11 mevdschee

Thanks for your replies. They are both very good news to me.

Here is my understanding of the tables definition:

SQL

CREATE TABLE authors_or_editors(
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(100) NOT NULL
);

CREATE TABLE forking_posts (
  id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  authors_id int(11) NOT NULL,
  editors_id int(11) NOT NULL,
  subject varchar(100) NOT NULL,
  text varchar(100) NOT NULL,
  FOREIGN KEY (authors_id) REFERENCES authors_or_editors (id),
  FOREIGN KEY (editors_id) REFERENCES authors_or_editors (id)
);

Here is how I would do the inserts without transactions:

POST /records/authors_or_editors

{
  "name":"first author name"
}

POST /records/authors_or_editors

{
  "name":"first editor name"
}

Outcome:

authors_or_editors:
id:1, name:"first author name"
id:2, name:"first editor name"
POST /records/forking_posts

{
  "authors_id":1,
  "editors_id":2,
  "subject":"first post subject",
  "text":"first post text"
}

Outcome:

forking_posts:
id:1, authors_id:1, editors_id:2, subject:"first post subject", text:"first post text"

Here is a get join (which I mistakenly understood not allowed by php-crud-api):

GET /records/forking_posts?join=authors_or_editors

Result:
{
    "records": [
        {
            "id": 1,
            "authors_id": {
                "id": 1,
                "name": "first author name"
            },
            "editors_id": {
                "id": 2,
                "name": "first editor name"
            },
            "subject": "first post subject",
            "text": "first post text"
        }
    ]
}

If I understand correctly, what you are proposing is to do something like this for transaction inserts:

POST /records/forking_posts?join=authors_or_editors

    "records": [
        {
            "authors_id": {
                "name": "second author name"
            },
            "editors_id": {
                "name": "second editor name"
            },
            "subject": "second post subject",
            "text": "second post text"
        }
    ]

Something like this for updates:

POST /records/forking_posts?join=authors_or_editors

    "records": [
        {
            "authors_id": {
                "id": 3,
                "name": "my second author name"
            },
            "editors_id": {
                "id": 4,
                "name": "my second editor name"
            },
            "id": 2,
            "subject": "my second post subject",
            "text": "my second post text"
        }
    ]

And something like this for updates and inserts together:

POST /records/forking_posts?join=authors_or_editors

    "records": [
        {
            "authors_id": {
                "id": 3,
                "name": "my second author name again"
            },
            "editors_id": {
                "id": 4,
                "name": "my second editor name again"
            },
            "subject": "third post subject",
            "text": "third post text"
        }
    ]

Is that correct?

lcuis avatar Nov 07 '19 02:11 lcuis

Something like this for updates:

Updates go with PUT

And something like this for updates and inserts together:

Yes, you only insert or update on the highest level. You always need to provide all records with all fields (except primary key, which you can leave out in some cases) for any child entity and you may have to provide the join path (in some cases). Also you can also still leave the child records out (like it works now).

mevdschee avatar Nov 07 '19 05:11 mevdschee

Something like this:

POST /records/posts?join=users

{
    "authors_user_id": {
        "name": "my author name"
    },
    "editors_user_id": {
        "name": "my editor name"
    },
    "subject": "third post subject",
    "text": "third post text"
}

It has some flaws.. but it is the best I can come up with.

mevdschee avatar Nov 07 '19 06:11 mevdschee

Yes, I can see some flaws but it would at least cover my initial request allowing me to get rid of this view based solution. Is there something I can do to help with this?

lcuis avatar Nov 07 '19 07:11 lcuis