incubator-pegasus
incubator-pegasus copied to clipboard
What if bulk load successfully only on some nodes while failed on other nodes?
From this http://pegasus.incubator.apache.org/en/2020/02/18/bulk-load-design.html, it says
需要说明的是,如果在bulk load在ingestion阶段失败或者在ingestion阶段执行cancel bulk load操作,可能会出现部分partition完成ingestion,而部分失败或者被cancel的情况,即部分partition成功导入了数据,部分partition没有导入数据的现象。
but it doesn't tell how to handle this situation. Is there any solution to solve this problem?
There are two cases which will cause bulk load ingestion data inconsistence:
- some partitions meet unrecoverable ingestion error during ingestion and some not
- simple network error or replica 2pc are NOT unrecoverable error, only the ingested files can not be recognized by rocksdb is unrecoverable
- force cancel during ingestion - this is only triggered by user not by system itself
There are currently no solution to handle such situation automatically by system, because pegasus doesn't support transaction through different partitions. For example, table has 8 partitions. Partition 0 ingest succeed, but partition 1 receive wrong-format sst files which is a unrecoverable error, it can not ingest those files, bulk load failed, and partition 0 won't reset those data. Ingestion is just like batch write, different partitions won't affect others' data, they are just different partitions. If user use our client batch write interface, it also can not gurantee that data wrote into different partitions should always be consistent.
The only solution is to retry bulk load after user fix broken files, which is triggered by user manually.
Thanks for the explanation. By the way, is there any plan to support transaction through different partitions?
Transaction is not in our current plan, because we don't find out its user case, but it can be discussed and implemented if any user has strong request on transaction in future.