Building AOCL-libFLAME and AOCL-BLIS on the DGX

The AMD Optimizing CPU Libraries (AOCL) can be downloaded from AMD Developer Central.

AOCL-libFLAME is an AMD optimized portable library for dense matrix computations, providing the complete
functionality present in Linear Algebra Package (LAPACK). The BLIS library is an equivalent of BLAS, with optimizations for the AMD EPYCTM processor family.

paolini@dgx:~/amd$ git clone https://github.com/amd/libflame.git
paolini@dgx:~/amd$ cd libflame
paolini@dgx:~/amd/libflame$ ./configure --enable-max-arg-list-hack --enable-multithreading=openmp --enable-optimizations --enable-ldim-alignment --enable-amd-flags --enable-lapack2flame
paolini@dgx:~/amd/libflame/test$ gcc  obj/test_lyap.o  obj/test_tridiagut.o  obj/test_trsm.o  obj/test_ldltx_nopiv_ps.o  obj/test_lqut.o  obj/test_trinv.o  obj/test_apqudut.o  obj/test_libflame.o  obj/test_uddateut.o  obj/test_lu_nopiv_i.o  obj/test_syr2k.o  obj/test_uddateutinc.o  obj/test_lu_incpiv.o  obj/test_lu_piv.o  obj/test_caqrutinc.o  obj/test_eig_gest.o  obj/test_herk.o  obj/test_her2k.o  obj/test_qrut.o  obj/test_bidiagut.o  obj/test_apqut.o  obj/test_apcaqutinc.o  obj/test_lu_nopiv.o  obj/test_qrutinc.o  obj/test_symm.o  obj/test_hemm.o  obj/test_sylv.o  obj/test_apqudutinc.o  obj/test_hessut.o  obj/test_chol.o  obj/test_spdinv.o  obj/test_gemm.o  obj/test_apqutinc.o  obj/test_ldlt2_nopiv_ps.o  obj/test_syrk.o  obj/test_common.o  obj/test_trmm.o ../lib/x86_64-unknown-linux-gnu//libflame.a  -fopenmp  -lm  -L../../amd-blis/lib/LP64 -lblis-mt  -Wl,-rpath,$HOME/amd/amd-blis/lib/LP64 -o test_libflame.x
paolini@dgx:~/amd/libflame/test$ ./test_libflame.x
 LibFlame version: AOCL-libFLAME 3.2, supports LAPACK 3.10.0

--- test suite parameters ----------------------------

n_repeats            2
n_storage            1
storage              c
n_datatypes          4
datatype[0]          100 (s)
        [1]          101 (d)
        [2]          102 (c)
        [3]          103 (z)
b_alg_flat           40
b_alg_hier           10
b_flash              40
p_first              80
p_max                160
p_inc                40
p_nfact              10
n_threads            2
reaction_to_failure  i
.
.
.
--- Partial / Incomplete LDLT(X) factorization without Pivoting ---

   API                             DATA_TYPE     SIZE  FLOPS   TIME(s)       ERROR      STATUS
   ====                            ==========    ==== =======  ========     ==========  ========
   SPFFRTX                           s|c          80   4.204  0.0000132250   3.99e-01   PASS for nfact=10
   SPFFRTX                           s|c         120   5.574  0.0000235350   2.70e-01   PASS for nfact=10
   SPFFRTX                           s|c         160   5.433  0.0000439540   2.47e-01   PASS for nfact=10
   SPFFRTX                           d|c          80   2.942  0.0000188960   1.86e-01   PASS for nfact=10
   SPFFRTX                           d|c         120   3.928  0.0000334030   8.81e-01   PASS for nfact=10
   SPFFRTX                           d|c         160   4.333  0.0000551150   7.58e-01   PASS for nfact=10
   SPFFRTX                           c|c          80  11.833  0.0000186650   5.41e-01   PASS for nfact=10
   SPFFRTX                           c|c         120  10.308  0.0000506860   2.52e-01   PASS for nfact=10
   SPFFRTX                           c|c         160  11.219  0.0000848610   2.52e-01   PASS for nfact=10
   SPFFRTX                           z|c          80   6.656  0.0000331830   2.24e-01   PASS for nfact=10
   SPFFRTX                           z|c         120   7.772  0.0000672270   4.60e-01   PASS for nfact=10
   SPFFRTX                           z|c         160   8.510  0.0001118720   2.63e-01   PASS for nfact=10