doris icon indicating copy to clipboard operation
doris copied to clipboard

fix fe oom because replicas too many when schema change

Open qzsee opened this issue 2 years ago • 0 comments

Proposed changes

Issue Number: close #xxx

Problem summary

version: 0.14

I hava a table that has 5000 partitions、100buckets、3 replicas

when do shcema change for this table. FE occur oom

2022-08-30 17:44:59,486 ERROR (thrift-server-pool-2646|3660) [EditLog.logEdit():890] Fatal Error : write stream Exception
java.lang.OutOfMemoryError: UTF16 String size is 1207959550, should be less than 1073741823
      at java.lang.StringUTF16.newBytesFor(StringUTF16.java:49) ~[?:?]
      at java.lang.AbstractStringBuilder.inflate(AbstractStringBuilder.java:228) ~[?:?]
      at java.lang.AbstractStringBuilder.appendChars(AbstractStringBuilder.java:1701) ~[?:?]
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:634) ~[?:?]
      at java.lang.StringBuffer.append(StringBuffer.java:392) ~[?:?]
      at java.io.StringWriter.write(StringWriter.java:122) ~[?:?]
      at com.google.gson.stream.JsonWriter.string(JsonWriter.java:590) ~[gson-2.8.6.jar:?]
      at com.google.gson.stream.JsonWriter.value(JsonWriter.java:418) ~[gson-2.8.6.jar:?]
      at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:746) ~[gson-2.8.6.jar:?]
      at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:760) ~[gson-2.8.6.jar:?]
      at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:752) ~[gson-2.8.6.jar:?]
      at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:760) ~[gson-2.8.6.jar:?]
      at com.google.gson.internal.bind.TypeAdapters$29.write(TypeAdapters.java:698) ~[gson-2.8.6.jar:?]
      at com.google.gson.internal.Streams.write(Streams.java:72) ~[gson-2.8.6.jar:?]
      at org.apache.doris.persist.gson.RuntimeTypeAdapterFactory$1.write(RuntimeTypeAdapterFactory.java:320) ~[palo-fe.jar:3.4.0]
      at com.google.gson.TypeAdapter$1.write(TypeAdapter.java:191) ~[gson-2.8.6.jar:?]
      at com.google.gson.Gson.toJson(Gson.java:704) ~[gson-2.8.6.jar:?]
      at com.google.gson.Gson.toJson(Gson.java:683) ~[gson-2.8.6.jar:?]
      at com.google.gson.Gson.toJson(Gson.java:638) ~[gson-2.8.6.jar:?]
      at org.apache.doris.alter.RollupJobV2.write(RollupJobV2.java:724) ~[palo-fe.jar:3.4.0]
      at org.apache.doris.alter.BatchAlterJobPersistInfo.write(BatchAlterJobPersistInfo.java:44) ~[palo-fe.jar:3.4.0]
      at org.apache.doris.journal.JournalEntity.write(JournalEntity.java:131) ~[palo-fe.jar:3.4.0]
      at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:145) ~[palo-fe.jar:3.4.0]
      at org.apache.doris.persist.EditLog.logEdit(EditLog.java:887) [palo-fe.jar:3.4.0]
      at org.apache.doris.persist.EditLog.logBatchAlterJob(EditLog.java:1364) [palo-fe.jar:3.4.0]
      at org.apache.doris.alter.MaterializedViewHandler.processBatchAddRollup(MaterializedViewHandler.java:290) [palo-fe.jar:3.4.0]
      at org.apache.doris.alter.MaterializedViewHandler.process(MaterializedViewHandler.java:1178) [palo-fe.jar:3.4.0]
      at org.apache.doris.alter.Alter.processAlterOlapTable(Alter.java:146) [palo-fe.jar:3.4.0]
      at org.apache.doris.alter.Alter.processAlterTable(Alter.java:307) [palo-fe.jar:3.4.0]
      at org.apache.doris.catalog.Catalog.alterTable(Catalog.java:5172) [palo-fe.jar:3.4.0]

AlterJobV2 has too many info need transform to json string. so oom.

So, I tested a reasonable value of 120W replicas as a Schema change limitation.

Checklist(Required)

  1. Does it affect the original behavior:
    • [x] Yes
    • [ ] No
    • [ ] I don't know
  2. Has unit tests been added:
    • [x] Yes
    • [ ] No
    • [ ] No Need
  3. Has document been added or modified:
    • [ ] Yes
    • [ ] No
    • [x] No Need
  4. Does it need to update dependencies:
    • [ ] Yes
    • [x] No
  5. Are there any changes that cannot be rolled back:
    • [ ] Yes (If Yes, please explain WHY)
    • [x] No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

qzsee avatar Sep 22 '22 04:09 qzsee