The AMD Optimizing CPU Libraries (AOCL) can be downloaded from AMD Developer Central.
AOCL-libFLAME is an AMD optimized portable library for dense matrix computations, providing the complete
functionality present in Linear Algebra Package (LAPACK). The BLIS library is an equivalent of BLAS, with optimizations for the AMD EPYCTM processor family.
paolini@dgx:~/amd$ git clone https://github.com/amd/libflame.git
paolini@dgx:~/amd$ cd libflame
paolini@dgx:~/amd/libflame$ ./configure --enable-max-arg-list-hack --enable-multithreading=openmp --enable-optimizations --enable-ldim-alignment --enable-amd-flags --enable-lapack2flame
paolini@dgx:~/amd/libflame/test$ gcc obj/test_lyap.o obj/test_tridiagut.o obj/test_trsm.o obj/test_ldltx_nopiv_ps.o obj/test_lqut.o obj/test_trinv.o obj/test_apqudut.o obj/test_libflame.o obj/test_uddateut.o obj/test_lu_nopiv_i.o obj/test_syr2k.o obj/test_uddateutinc.o obj/test_lu_incpiv.o obj/test_lu_piv.o obj/test_caqrutinc.o obj/test_eig_gest.o obj/test_herk.o obj/test_her2k.o obj/test_qrut.o obj/test_bidiagut.o obj/test_apqut.o obj/test_apcaqutinc.o obj/test_lu_nopiv.o obj/test_qrutinc.o obj/test_symm.o obj/test_hemm.o obj/test_sylv.o obj/test_apqudutinc.o obj/test_hessut.o obj/test_chol.o obj/test_spdinv.o obj/test_gemm.o obj/test_apqutinc.o obj/test_ldlt2_nopiv_ps.o obj/test_syrk.o obj/test_common.o obj/test_trmm.o ../lib/x86_64-unknown-linux-gnu//libflame.a -fopenmp -lm -L../../amd-blis/lib/LP64 -lblis-mt -Wl,-rpath,$HOME/amd/amd-blis/lib/LP64 -o test_libflame.x
paolini@dgx:~/amd/libflame/test$ ./test_libflame.x
LibFlame version: AOCL-libFLAME 3.2, supports LAPACK 3.10.0
--- test suite parameters ----------------------------
n_repeats 2
n_storage 1
storage c
n_datatypes 4
datatype[0] 100 (s)
[1] 101 (d)
[2] 102 (c)
[3] 103 (z)
b_alg_flat 40
b_alg_hier 10
b_flash 40
p_first 80
p_max 160
p_inc 40
p_nfact 10
n_threads 2
reaction_to_failure i
.
.
.
--- Partial / Incomplete LDLT(X) factorization without Pivoting ---
API DATA_TYPE SIZE FLOPS TIME(s) ERROR STATUS
==== ========== ==== ======= ======== ========== ========
SPFFRTX s|c 80 4.204 0.0000132250 3.99e-01 PASS for nfact=10
SPFFRTX s|c 120 5.574 0.0000235350 2.70e-01 PASS for nfact=10
SPFFRTX s|c 160 5.433 0.0000439540 2.47e-01 PASS for nfact=10
SPFFRTX d|c 80 2.942 0.0000188960 1.86e-01 PASS for nfact=10
SPFFRTX d|c 120 3.928 0.0000334030 8.81e-01 PASS for nfact=10
SPFFRTX d|c 160 4.333 0.0000551150 7.58e-01 PASS for nfact=10
SPFFRTX c|c 80 11.833 0.0000186650 5.41e-01 PASS for nfact=10
SPFFRTX c|c 120 10.308 0.0000506860 2.52e-01 PASS for nfact=10
SPFFRTX c|c 160 11.219 0.0000848610 2.52e-01 PASS for nfact=10
SPFFRTX z|c 80 6.656 0.0000331830 2.24e-01 PASS for nfact=10
SPFFRTX z|c 120 7.772 0.0000672270 4.60e-01 PASS for nfact=10
SPFFRTX z|c 160 8.510 0.0001118720 2.63e-01 PASS for nfact=10