cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[DBG_X] Compilation failures in L1Trigger/L1TMuonOverlapPhase2

Open iarspider opened this issue 1 year ago • 15 comments

Compilation of L1Trigger/L1TMuonOverlapPhase2 package failed for el8_amd64_gcc12 in CMSSW_14_1_DBG_X_2024-05-09-2300:

<gcc>/bin/ld.bfd: tmp/el8_amd64_gcc12/src/L1Trigger/L1TMuonOverlapPhase2/src/L1TriggerL1TMuonOverlapPhase2/ccUO3zNJ.ltrans0.ltrans.o: in function `PtAssignmentNNRegression::PtAssignmentNNRegression(edm::ParameterSet const&, OMTFConfiguration const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)':
  .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<10, 4, 18ul, 3, 13, 16, 4>::input_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<10, 4, 18ul, 3, 13, 16, 4>::lut_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<10, 4, 18ul, 3, 13, 16, 4>::lutSize'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<10, 4, 18ul, 3, 13, 16, 4>::lutOutSum_I'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<10, 4, 18ul, 3, 13, 16, 4>::lutOutSum_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<10, 4, 18ul, 3, 13, 16, 4>::output_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `ap_int_base<4, false>::width'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 1ul, 4, 11, 1, 8>::input_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 1ul, 4, 11, 1, 8>::lut_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 1ul, 4, 11, 1, 8>::lutSize'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 1ul, 4, 11, 1, 8>::lutOutSum_I'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 1ul, 4, 11, 1, 8>::lutOutSum_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 1ul, 4, 11, 1, 8>::output_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `ap_int_base<8, false>::width'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 8ul, 5, 11, 1, 8>::input_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 8ul, 5, 11, 1, 8>::lut_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 8ul, 5, 11, 1, 8>::lutSize'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 8ul, 5, 11, 1, 8>::lutOutSum_I'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 8ul, 5, 11, 1, 8>::lutOutSum_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<5, 11, 8ul, 5, 11, 1, 8>::output_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `ap_int_base<8, false>::width'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::input_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lut_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lutSize'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lutOutSum_I'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lutOutSum_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::output_W'
   <gcc>/bin/ld.bfd: .../CMSSW_14_1_DBG_X_2024-05-09-2300/src/FWCore/MessageLogger/interface/ErrorObj.icc:45: undefined reference to `ap_int_base<5, false>::width'
 collect2: error: ld returned 1 exit status
  gmake: *** [tmp/el8_amd64_gcc12/src/L1Trigger/L1TMuonOverlapPhase2/src/L1TriggerL1TMuonOverlapPhase2/libL1TriggerL1TMuonOverlapPhase2.so] Error 1
``

[full log](https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc12/CMSSW_14_1_DBG_X_2024-05-09-2300/L1Trigger/L1TMuonOverlapPhase2)

iarspider avatar May 10 '24 08:05 iarspider

assign l1,core

iarspider avatar May 10 '24 08:05 iarspider

New categories assigned: l1,core

@Dr15Jones,@makortel,@epalencia,@aloeliger,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar May 10 '24 08:05 cmsbuild

cms-bot internal usage

cmsbuild avatar May 10 '24 08:05 cmsbuild

A new Issue was created by @iarspider.

@makortel, @smuzaffar, @sextonkennedy, @rappoccio, @antoniovilela, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar May 10 '24 08:05 cmsbuild

@mbluj @kbunkow FYI

aloeliger avatar May 10 '24 09:05 aloeliger

I think constexpr should fix the one from cmssw i.e.

undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::input_W'
undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lut_W'
undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lutSize'
undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lutOutSum_I'
undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::lutOutSum_W'
undefined reference to `lutNN::LutNeuronLayerFixedPoint<8, 13, 16ul, 5, 11, 9, 5>::output_W'

and for undefined reference to ap_int_base<5, false>::width'`, I guess we need to patch https://github.com/Xilinx/HLS_arbitrary_Precision_Types/blob/master/include/ap_int_base.h#L125 to have constexpr ?

smuzaffar avatar May 10 '24 09:05 smuzaffar

It compiles fine with non-DBG IB, explicitly checked with CMSSW_14_1_DBG_X_2024-05-09-2300 and el8_amd64_gcc12. What is difference between regular and DBG releases? The L1Trigger/L1TMuonOverlapPhase2 package was recently added to CMSSW, could it be that it should be "registered" somewhere (its libraries known to some tools) to be dbg'ed?

mbluj avatar May 10 '24 09:05 mbluj

DBG IBs are built with -g -O3 -DEDM_ML_DEBUG flags and in this case the code https://github.com/cms-sw/cmssw/blob/master/L1Trigger/L1TMuonOverlapPhase2/interface/LutNeuronLayerFixedPoint.h#L64-L73 gets compiled https://github.com/cms-sw/cmssw/blob/master/FWCore/MessageLogger/interface/MessageLogger.h#L240-L252

smuzaffar avatar May 10 '24 09:05 smuzaffar

Thank you @smuzaffar I confirm that usage of constexpr fixes issues within OMTF code, but I think I am not able to fix them in header from AMD. How to proceed? Should I prepare PR with fixes in OMTF and assuming that other thing will be fixed by someone else?

mbluj avatar May 10 '24 11:05 mbluj

We already patch the https://github.com/Xilinx/HLS_arbitrary_Precision_Types in cmsdist so in principle we could continue doing that.

Written that, it seems to me the frequency of problems with this package is increasing, and the package (in GitHub) seems practically unmaintained since 5 years, so I'm getting worried about long-term sustainability.

makortel avatar May 10 '24 13:05 makortel

OK, so I prepare PR with CMSSW fixes. What concerns the external package I cannot comment, but at least from a quick look it is still recommended by Xylinx/AMD in HLS documentation...

mbluj avatar May 10 '24 13:05 mbluj

Written that, it seems to me the frequency of problems with this package is increasing, and the package (in GitHub) seems practically unmaintained since 5 years, so I'm getting worried about long-term sustainability.

I diffed against Vitis_HLS 2022.2 and it looks like there's only a handful of changes that are directly relevant to us (most of the changes don't apply to the Apache licensed version), and most of them look to be constexpr-related. These are sufficiently small that I think it would be ok to cherry pick those for the CMS version and make a PR to the Xilinx repo.

dan131riley avatar May 14 '24 12:05 dan131riley

We have had one PR (https://github.com/Xilinx/HLS_arbitrary_Precision_Types/pull/1) sitting there for ~6 months without any reaction.

makortel avatar May 14 '24 13:05 makortel

https://github.com/cms-sw/cmssw/pull/44974 and https://github.com/cms-sw/cmsdist/pull/9193 should fix the DBG link errors

smuzaffar avatar May 15 '24 11:05 smuzaffar

I agree with @makortel, https://github.com/Xilinx/HLS_arbitrary_Precision_Types looks unmaintained. No updates in last 5 years and even the PR I opened to initialize the memory is open since NOV last year

smuzaffar avatar May 15 '24 11:05 smuzaffar