doris
doris copied to clipboard
fix fe oom because replicas too many when schema change
Proposed changes
Issue Number: close #xxx
Problem summary
version: 0.14
I hava a table that has 5000 partitions、100buckets、3 replicas
when do shcema change for this table. FE occur oom
2022-08-30 17:44:59,486 ERROR (thrift-server-pool-2646|3660) [EditLog.logEdit():890] Fatal Error : write stream Exception
java.lang.OutOfMemoryError: UTF16 String size is 1207959550, should be less than 1073741823
at java.lang.StringUTF16.newBytesFor(StringUTF16.java:49) ~[?:?]
at java.lang.AbstractStringBuilder.inflate(AbstractStringBuilder.java:228) ~[?:?]
at java.lang.AbstractStringBuilder.appendChars(AbstractStringBuilder.java:1701) ~[?:?]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:634) ~[?:?]
at java.lang.StringBuffer.append(StringBuffer.java:392) ~[?:?]
at java.io.StringWriter.write(StringWriter.java:122) ~[?:?]
at com.google.gson.stream.JsonWriter.string(JsonWriter.java:590) ~[gson-2.8.6.jar:?]
at com.google.gson.stream.JsonWriter.value(JsonWriter.java:418) ~[gson-2.8.6.jar:?]
at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:746) ~[gson-2.8.6.jar:?]
at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:760) ~[gson-2.8.6.jar:?]
at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:752) ~[gson-2.8.6.jar:?]
at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:760) ~[gson-2.8.6.jar:?]
at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:698) ~[gson-2.8.6.jar:?]
at com.google.gson.internal.Streams.write(Streams.java:72) ~[gson-2.8.6.jar:?]
at org.apache.doris.persist.gson.RuntimeTypeAdapterFactory$1.write(RuntimeTypeAdapterFactory.java:320) ~[palo-fe.jar:3.4.0]
at com.google.gson.TypeAdapter$1.write(TypeAdapter.java:191) ~[gson-2.8.6.jar:?]
at com.google.gson.Gson.toJson(Gson.java:704) ~[gson-2.8.6.jar:?]
at com.google.gson.Gson.toJson(Gson.java:683) ~[gson-2.8.6.jar:?]
at com.google.gson.Gson.toJson(Gson.java:638) ~[gson-2.8.6.jar:?]
at org.apache.doris.alter.RollupJobV2.write(RollupJobV2.java:724) ~[palo-fe.jar:3.4.0]
at org.apache.doris.alter.BatchAlterJobPersistInfo.write(BatchAlterJobPersistInfo.java:44) ~[palo-fe.jar:3.4.0]
at org.apache.doris.journal.JournalEntity.write(JournalEntity.java:131) ~[palo-fe.jar:3.4.0]
at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:145) ~[palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logEdit(EditLog.java:887) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logBatchAlterJob(EditLog.java:1364) [palo-fe.jar:3.4.0]
at org.apache.doris.alter.MaterializedViewHandler.processBatchAddRollup(MaterializedViewHandler.java:290) [palo-fe.jar:3.4.0]
at org.apache.doris.alter.MaterializedViewHandler.process(MaterializedViewHandler.java:1178) [palo-fe.jar:3.4.0]
at org.apache.doris.alter.Alter.processAlterOlapTable(Alter.java:146) [palo-fe.jar:3.4.0]
at org.apache.doris.alter.Alter.processAlterTable(Alter.java:307) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.alterTable(Catalog.java:5172) [palo-fe.jar:3.4.0]
AlterJobV2 has too many info need transform to json string. so oom.
So, I tested a reasonable value of 120W replicas as a Schema change limitation.
Checklist(Required)
- Does it affect the original behavior:
- [x] Yes
- [ ] No
- [ ] I don't know
- Has unit tests been added:
- [x] Yes
- [ ] No
- [ ] No Need
- Has document been added or modified:
- [ ] Yes
- [ ] No
- [x] No Need
- Does it need to update dependencies:
- [ ] Yes
- [x] No
- Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [x] No
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...