onnx-simplifier
onnx-simplifier copied to clipboard
[BUG] Constants may become duplicated in eliminate_const_nodes
Describe the bug
When simplifying a model, it can happen that the model gets much larger than it was originally and as a consequence exceeds the maximum size of 2GB that ONNX protobuf can handle. I suspect that it happens because of a bug in the constant folding logic that results in duplicating the weights of the model. In the model that failed for me, many model weights have a transpose node on them. The result of that transpose is a constant, and hence is replaced by another constant node. The transpose node is eliminated, but the original constant weight remains there. I am not sure if the constant node remains there (it may be eliminated later), however the actual tensor data remains there in the graph initializer.
I managed to seemingly fix the issue by adding the following code to the end of the function eliminate_const_nodes
, however this may not be the best solution, it just shows where the problem is:
used_tensors = {tensor for node in model.graph.node for tensor in node.input}
for i, node in enumerate(model.graph.node):
if node.op_type == 'Constant' and node.output[0] not in used_tensors:
del model.graph.node[i]
for i, tensor in enumerate(model.graph.initializer):
if tensor.name not in used_tensors:
del model.graph.initializer[i]
return model
Model
Here is a link to the model, which comes from the MLPerf inference benchmark page (https://github.com/mlcommons/inference/tree/master/language/bert):
https://zenodo.org/record/3733910
Thanks for your report!
The issue you reported does exist. For most models, the redundant constant ops and initializers will be eliminated in the following procedure -- onnx optimizer. But for the bert model your mentioned, the model with redudant constant op and initializers cannot be passed into onnx optimizer because its size is (temporarily) larger than 2GB. So the simplification fails.
The solution you provided works in most cases but will break in other cases where the onnx model contains subgraphs -- an initializer marked as unused may indeed used by some op in subgraph. I'm afraid that there is not a clean and easy way to solve the issue for the moment. The most promising way is to refactor onnx optimizer.
Since I am maintaining onnx simplifier and onnx optimizer only in my spare time, it will take some time to complete the refactoring. I'll keep this issue updated.
Thanks for following up @daquexian ! Indeed, my current fix does not care about subgraphs, but I believe it could be easily modified to do so. When calculating used_tensors
, the code could iterate through all ops in all subgraphs, instead of just the main graph. In order to do so, first all the subgraphs would have to be enumerated, but that's possible to do recursively going through all the ops, and all their attributes, and when a graph attribute is found, recursing into that subgraph. What do you think, could that be turned into a simple fix that takes care of all cases?
Thanks for following up @daquexian ! Indeed, my current fix does not care about subgraphs, but I believe it could be easily modified to do so. When calculating
used_tensors
, the code could iterate through all ops in all subgraphs, instead of just the main graph. In order to do so, first all the subgraphs would have to be enumerated, but that's possible to do recursively going through all the ops, and all their attributes, and when a graph attribute is found, recursing into that subgraph. What do you think, could that be turned into a simple fix that takes care of all cases?
You are right. It is easy to cover subgraphs. However I'm afraid it is not a decent way, because this subgraph logic will be duplicated in both onnxsim and onnx optimizer. Since onnx optimizer itself can handle subgraphs, I think the clean way is to fix the 2GB problem of onnx optimizer, and just run onnx optimizer to remove the redundant initializer in onnxsim.