node matching error while using NNCF with Transformers-LT (fairseq)
Seeing a node matching error (logs attached) while using NNCF with a Transformer model based on the fairseq implementation: error_logs_nncf.txt
Steps to replicate the issue:
-
git clone https://github.com/mit-han-lab/hardware-aware-transformers.git
-
cd hardware-aware-transformers
-
pip install --editable .
-
pip install -U torch==1.9.1
-
pip install nncf
-
bash configs/wmt14.en-de/get_preprocessed.sh
-
git apply nncf_diff.txt
(nncf_diff.txt) -
python train.py --configs=configs/wmt14.en-de/subtransformer/[email protected][email protected] --sub-configs=configs/wmt14.en-de/subtransformer/common.yml --validate-subtransformer --cpu
Could reproduce the original issue and it seems to be fixed in #1600. However, after that I experience another failure:
Traceback (most recent call last):
File "train.py", line 407, in <module>
cli_main()
File "train.py", line 403, in cli_main
main(args)
File "train.py", line 74, in main
compression_ctrl, model = create_compressed_model(model, nncf_config)
File "/home/vshampor/work/nncf/nncf/telemetry/decorator.py", line 71, in wrapped
retval = fn(*args, **kwargs)
File "/home/vshampor/work/nncf/nncf/torch/model_creation.py", line 110, in create_compressed_model
compressed_model = builder.apply_to(nncf_network)
File "/home/vshampor/work/nncf/nncf/torch/compression_method_api.py", line 123, in apply_to
transformation_layout = self.get_transformation_layout(model)
File "/home/vshampor/work/nncf/nncf/torch/compression_method_api.py", line 142, in get_transformation_layout
layout = self._get_transformation_layout(model)
File "/home/vshampor/work/nncf/nncf/torch/quantization/algo.py", line 634, in _get_transformation_layout
self._pt_quantizer_setup = self._get_quantizer_setup(target_model)
File "/home/vshampor/work/nncf/nncf/torch/quantization/algo.py", line 720, in _get_quantizer_setup
single_config_quantizer_setup = self._get_single_config_quantizer_setup(target_model)
File "/home/vshampor/work/nncf/nncf/torch/quantization/algo.py", line 713, in _get_single_config_quantizer_setup
single_config_quantizer_setup = setup_generator.generate_setup()
File "/home/vshampor/work/nncf/nncf/torch/quantization/algo.py", line 396, in generate_setup
quantization_proposal = prop_graph_solver.run_on_ip_graph(merged_ip_graph)
File "/home/vshampor/work/nncf/nncf/common/quantization/quantizer_propagation/solver.py", line 505, in run_on_ip_graph
quantizer_setup = quant_prop_graph.create_quantizer_setup(self._weight_quantizable_node_names_vs_qconfigs)
File "/home/vshampor/work/nncf/nncf/common/quantization/quantizer_propagation/graph.py", line 1141, in create_quantizer_setup
setup = self._handle_output_quantizers_for_weights_as_outputs_ops(setup, pqid_vs_qpid,
File "/home/vshampor/work/nncf/nncf/common/quantization/quantizer_propagation/graph.py", line 1178, in _handle_output_quantizers_for_weights_as_outputs_ops
wao_qp_id = wao_op_node_key_vs_wq_id[wao_op_node_key]
KeyError: '283 TransformerSuperModel/TransformerDecoder[decoder]/EmbeddingSuper[embed_tokens]/embedding_0'
@vshampor , is it still valid? If yes, any idea about the second failure?
Still reproduces as of today with the second failure, investigating.