Xiong Yuan

Results 2 issues of Xiong Yuan

Fix the "numel needs to be smaller than int32_t max; otherwise, please use packed_accessor64" issue. ``` verts, faces = self.mc_func(level.to(get_rank()), threshold) File "/usr/local/lib/python3.10/dist-packages/torchmcubes/__init__.py", line 12, in marching_cubes return mc.mcubes_cuda(vol, thresh)...

### Details: - MatMul dequantization Convert both dequantization scale variables (mulConst1 & mulConst2) to f32 instead of just one (mulConst2), to avoid different data type complaint issue (f16 & f32)....

category: LP transformations
ExternalPR