superlu_dist icon indicating copy to clipboard operation
superlu_dist copied to clipboard

Factor flops 0.0000e+00 Mflops with v9.0.0

Open edwardnjust opened this issue 1 year ago • 4 comments

I run the EXAMPLE pddrive3d, and the result shows Factor flops is 0, but Solve flops has number. Is it a bu Uploading PixPin_2024-05-11_11-37-05.png… g?

edwardnjust avatar May 11 '24 03:05 edwardnjust

I cannot see your figure. Can you upload it again?

liuyangzhuan avatar May 11 '24 04:05 liuyangzhuan

the detailed output is :


.. blocking parameters from sp_ienv(): ** relaxation : 60 ** max supernode : 256 ** estimated fill ratio : 5 ** min GEMM mkn to use GPU : 5000 .. parallel environment: ** OpenMP threads : 16 ** GPU enable? : 1



.. options: ** Fact : 0 ** Equil : 1 ** DiagInv : 0 ** ParSymbFact : 0 ** ColPerm : 4 ** RowPerm : 1 ** ReplaceTinyPivot : 0 ** IterRefine : 0 ** Trans : 0 ** num_lookaheads : 10 ** batchCount : 0 ** SymPattern : 0 ** lookahead_etree : 0 ** Use_TensorCore : 0 ** Use 3D algorithm : 1 ** parameters that can be altered by environment variables: ** superlu_relax : 60 ** superlu_maxsup : 256 ** min GEMM mkn to use GPU : 5000 ** GPU buffer size : 256000000 ** GPU streams : 8 ** estimated fill ratio : 5


first gpufree time: 0.2370 first blas create time: 1.8609 MPI_Query_thread with MPI_THREAD_MULTIPLE STDC_VERSION 199901 Library version: 9.0.0 Input matrix file: ../../../Matrix/pangulu_matrix/apache2/apache2.rb 3D process grid: 1 X 1 X 1 GHS_psdef/apache2; 2006; ; ed: N. Gould et al. |1423 FormFullA: new_nnz = 4817870, k = 4817870 Time to read and distribute matrix 0.44 Matrix size min_mn 715176 Nonzeros in L 157313555 Nonzeros in U 157313555 nonzeros in L+U 313911934 nonzeros in LSUB 36278952

** Memory Usage ********************************** ** Total highmark (MB): Sum-of-all : 3426.29 | Avg : 3426.29 | Max : 3426.29 Max at rank 0, different stages (MB): . symbfact 264.60 . distribution 3426.29 . numfact 2668.18 ** NUMfact space (MB): (sum-of-all-processes) L\U : 2668.18 | Total : 2668.18 . max at rank 0, max L+U memory (MB): 2668.18 . max at rank 0, peak buffer (MB): 0.00


** number of Tiny Pivots: 0

.. Sol 0: ||X - Xtrue|| / ||X|| = 3.790301e-13 max_i |x - xtrue|_i / |x|_i = 3.790301e-13


**** Time (seconds) **** EQUIL time 0.021 ROWPERM time 0.107 COLPERM time 5.630 SYMBFACT time 0.708 DISTRIBUTE time 3.099 FACTOR time 14.094 Factor flops 0.000000e+00 Mflops 0.00 SOLVE time 0.453 Solve flops 6.278243e+08 Mflops 1385.81


edwardnjust avatar May 11 '24 05:05 edwardnjust

Ah, good catch. The latest c++ GPU factorization code doesn't quite compute the factor flops yet. We will work on how to fix this.

liuyangzhuan avatar May 11 '24 06:05 liuyangzhuan

When will fix the bug? I am trying to fix this bug recently. Could you give some advice about how to fix? Which are the relative cpp files?

edwardnjust avatar Jul 15 '24 01:07 edwardnjust

I met the same bug in the recent commit (a8d3bc10105990df2fecf763bccf1e0feec07c8f). The Flops of pddrive3d is 0. Here is the print log.

**************************************************
.. blocking parameters from sp_ienv():
**    relaxation                 : 30
**    max supernode              : 256
**    estimated fill ratio       : 5
**    min GEMM m*k*n to use GPU  : 5000
.. parallel environment:
**    OpenMP threads             :   32
**    GPU enabled?               :    1
**************************************************
**************************************************
.. options:
**    Fact                      :    0
**    Equil                     :    1
**    DiagInv                   :    0
**    UserDefineSupernode       :    0
**    ParSymbFact               :    0
**    ColPerm                   :    4
**    RowPerm                   :    1
**    ReplaceTinyPivot          :    0
**    IterRefine                :    0
**    Trans                     :    0
**    num_lookaheads            :   10
**    batchCount                :    0
**    SymPattern                :    0
**    lookahead_etree           :    0
**    Use_TensorCore            :    0
** parameters that can be altered by environment variables:
**    superlu_relax             :   30
**    superlu_maxsup            :  256
**    min GEMM m*k*n to use GPU : 5000
**    GPU buffer size           : 256000000
**    GPU streams               :    8
**    estimated fill ratio      :    5
**************************************************
first gpufree time:  0.3111
first blas create time:  0.0225
MPI_Query_thread with MPI_THREAD_MULTIPLE
__STDC_VERSION__ 199901
Library version:	9.1.0
Input matrix file:	/staff/wangchao/matrix_collection/Sandia/MM/ASIC_100k.mtx
3D process grid: 1 X 1 X 1
m 99340, n 99340, nonz 954163
triplet file: row/col indices are one-based.
Time to read and distribute matrix 0.48
	matrix dimension    99340
	nonzeros in A       954163
	nonzeros in L       3043275
	nonzeros in U       3043146
	nonzeros in L+U     5987081
	fill ratio             6.3
	nonzeros in LSUB    909890

** Memory Usage **********************************
** Total highmark (MB):
    Sum-of-all : 312776008133790977228800.00 | Avg : 312776008133790977228800.00  | Max : 312776002927197478191104.00
    Max at rank 0, different stages (MB):
	. symbfact           36.61
	. distribution    312776008133790977228800.00
	. numfact            86.34
** NUMfact space (MB): (sum-of-all-processes)
    L\U :           86.34 |  Total :    86.34
	. max at rank 0, max L+U memory (MB):    86.34
	. max at rank 0, peak buffer (MB):        0.00
**************************************************

** number of Tiny Pivots:        0

.. Sol  0: ||X - Xtrue|| / ||X|| = 3.077798e-10	 max_i |x - xtrue|_i / |x|_i = 3.077798e-10
**************************************************
**** Time (seconds) ****
	EQUIL time            0.006
	ROWPERM time          0.039
	COLPERM time          0.635
	SYMBFACT time         0.037
	DISTRIBUTE time       0.312
	FACTOR time           2.926
	Factor flops	0.000000e+00	Mflops 	    0.00
	SOLVE time            0.071
	Solve flops	1.197416e+07	Mflops 	  168.55
**************************************************

edwardnjust avatar Apr 27 '25 13:04 edwardnjust

fixed in the new commit b35fdb0

xiaoyeli avatar May 05 '25 05:05 xiaoyeli