superlu_dist
superlu_dist copied to clipboard
Factor flops 0.0000e+00 Mflops with v9.0.0
I run the EXAMPLE pddrive3d, and the result shows Factor flops is 0, but Solve flops has number. Is it a bu
g?
I cannot see your figure. Can you upload it again?
the detailed output is :
.. blocking parameters from sp_ienv(): ** relaxation : 60 ** max supernode : 256 ** estimated fill ratio : 5 ** min GEMM mkn to use GPU : 5000 .. parallel environment: ** OpenMP threads : 16 ** GPU enable? : 1
.. options: ** Fact : 0 ** Equil : 1 ** DiagInv : 0 ** ParSymbFact : 0 ** ColPerm : 4 ** RowPerm : 1 ** ReplaceTinyPivot : 0 ** IterRefine : 0 ** Trans : 0 ** num_lookaheads : 10 ** batchCount : 0 ** SymPattern : 0 ** lookahead_etree : 0 ** Use_TensorCore : 0 ** Use 3D algorithm : 1 ** parameters that can be altered by environment variables: ** superlu_relax : 60 ** superlu_maxsup : 256 ** min GEMM mkn to use GPU : 5000 ** GPU buffer size : 256000000 ** GPU streams : 8 ** estimated fill ratio : 5
first gpufree time: 0.2370 first blas create time: 1.8609 MPI_Query_thread with MPI_THREAD_MULTIPLE STDC_VERSION 199901 Library version: 9.0.0 Input matrix file: ../../../Matrix/pangulu_matrix/apache2/apache2.rb 3D process grid: 1 X 1 X 1 GHS_psdef/apache2; 2006; ; ed: N. Gould et al. |1423 FormFullA: new_nnz = 4817870, k = 4817870 Time to read and distribute matrix 0.44 Matrix size min_mn 715176 Nonzeros in L 157313555 Nonzeros in U 157313555 nonzeros in L+U 313911934 nonzeros in LSUB 36278952
** Memory Usage ********************************** ** Total highmark (MB): Sum-of-all : 3426.29 | Avg : 3426.29 | Max : 3426.29 Max at rank 0, different stages (MB): . symbfact 264.60 . distribution 3426.29 . numfact 2668.18 ** NUMfact space (MB): (sum-of-all-processes) L\U : 2668.18 | Total : 2668.18 . max at rank 0, max L+U memory (MB): 2668.18 . max at rank 0, peak buffer (MB): 0.00
** number of Tiny Pivots: 0
.. Sol 0: ||X - Xtrue|| / ||X|| = 3.790301e-13 max_i |x - xtrue|_i / |x|_i = 3.790301e-13
**** Time (seconds) **** EQUIL time 0.021 ROWPERM time 0.107 COLPERM time 5.630 SYMBFACT time 0.708 DISTRIBUTE time 3.099 FACTOR time 14.094 Factor flops 0.000000e+00 Mflops 0.00 SOLVE time 0.453 Solve flops 6.278243e+08 Mflops 1385.81
Ah, good catch. The latest c++ GPU factorization code doesn't quite compute the factor flops yet. We will work on how to fix this.
When will fix the bug? I am trying to fix this bug recently. Could you give some advice about how to fix? Which are the relative cpp files?
I met the same bug in the recent commit (a8d3bc10105990df2fecf763bccf1e0feec07c8f). The Flops of pddrive3d is 0. Here is the print log.
**************************************************
.. blocking parameters from sp_ienv():
** relaxation : 30
** max supernode : 256
** estimated fill ratio : 5
** min GEMM m*k*n to use GPU : 5000
.. parallel environment:
** OpenMP threads : 32
** GPU enabled? : 1
**************************************************
**************************************************
.. options:
** Fact : 0
** Equil : 1
** DiagInv : 0
** UserDefineSupernode : 0
** ParSymbFact : 0
** ColPerm : 4
** RowPerm : 1
** ReplaceTinyPivot : 0
** IterRefine : 0
** Trans : 0
** num_lookaheads : 10
** batchCount : 0
** SymPattern : 0
** lookahead_etree : 0
** Use_TensorCore : 0
** parameters that can be altered by environment variables:
** superlu_relax : 30
** superlu_maxsup : 256
** min GEMM m*k*n to use GPU : 5000
** GPU buffer size : 256000000
** GPU streams : 8
** estimated fill ratio : 5
**************************************************
first gpufree time: 0.3111
first blas create time: 0.0225
MPI_Query_thread with MPI_THREAD_MULTIPLE
__STDC_VERSION__ 199901
Library version: 9.1.0
Input matrix file: /staff/wangchao/matrix_collection/Sandia/MM/ASIC_100k.mtx
3D process grid: 1 X 1 X 1
m 99340, n 99340, nonz 954163
triplet file: row/col indices are one-based.
Time to read and distribute matrix 0.48
matrix dimension 99340
nonzeros in A 954163
nonzeros in L 3043275
nonzeros in U 3043146
nonzeros in L+U 5987081
fill ratio 6.3
nonzeros in LSUB 909890
** Memory Usage **********************************
** Total highmark (MB):
Sum-of-all : 312776008133790977228800.00 | Avg : 312776008133790977228800.00 | Max : 312776002927197478191104.00
Max at rank 0, different stages (MB):
. symbfact 36.61
. distribution 312776008133790977228800.00
. numfact 86.34
** NUMfact space (MB): (sum-of-all-processes)
L\U : 86.34 | Total : 86.34
. max at rank 0, max L+U memory (MB): 86.34
. max at rank 0, peak buffer (MB): 0.00
**************************************************
** number of Tiny Pivots: 0
.. Sol 0: ||X - Xtrue|| / ||X|| = 3.077798e-10 max_i |x - xtrue|_i / |x|_i = 3.077798e-10
**************************************************
**** Time (seconds) ****
EQUIL time 0.006
ROWPERM time 0.039
COLPERM time 0.635
SYMBFACT time 0.037
DISTRIBUTE time 0.312
FACTOR time 2.926
Factor flops 0.000000e+00 Mflops 0.00
SOLVE time 0.071
Solve flops 1.197416e+07 Mflops 168.55
**************************************************
fixed in the new commit b35fdb0