{"id":490,"date":"2022-07-26T13:48:12","date_gmt":"2022-07-26T20:48:12","guid":{"rendered":"https:\/\/iotlab.sdsu.edu\/?page_id=490"},"modified":"2022-10-30T17:41:24","modified_gmt":"2022-10-31T00:41:24","slug":"building-aocl-libflame-and-aocl-blis-on-the-dgx","status":"publish","type":"page","link":"https:\/\/iotlab.sdsu.edu\/index.php\/building-aocl-libflame-and-aocl-blis-on-the-dgx\/","title":{"rendered":"Building AOCL-libFLAME and AOCL-BLIS on the DGX"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The AMD Optimizing CPU Libraries (AOCL) can be downloaded from <a href=\"https:\/\/developer.amd.com\/amd-aocl\/\">AMD Developer Central<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">AOCL-libFLAME is an AMD optimized portable library for dense matrix computations, providing the complete<br>functionality present in Linear Algebra Package (LAPACK). The BLIS library is an equivalent of BLAS, with optimizations for the AMD EPYC<sup>TM<\/sup> processor family.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>paolini@dgx:~\/amd$ git clone https:\/\/github.com\/amd\/libflame.git<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>paolini@dgx:~\/amd$ cd libflame<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>paolini@dgx:~\/amd\/libflame$ .\/configure --enable-max-arg-list-hack --enable-multithreading=openmp --enable-optimizations --enable-ldim-alignment --enable-amd-flags --enable-lapack2flame<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>paolini@dgx:~\/amd\/libflame\/test$ gcc  obj\/test_lyap.o  obj\/test_tridiagut.o  obj\/test_trsm.o  obj\/test_ldltx_nopiv_ps.o  obj\/test_lqut.o  obj\/test_trinv.o  obj\/test_apqudut.o  obj\/test_libflame.o  obj\/test_uddateut.o  obj\/test_lu_nopiv_i.o  obj\/test_syr2k.o  obj\/test_uddateutinc.o  obj\/test_lu_incpiv.o  obj\/test_lu_piv.o  obj\/test_caqrutinc.o  obj\/test_eig_gest.o  obj\/test_herk.o  obj\/test_her2k.o  obj\/test_qrut.o  obj\/test_bidiagut.o  obj\/test_apqut.o  obj\/test_apcaqutinc.o  obj\/test_lu_nopiv.o  obj\/test_qrutinc.o  obj\/test_symm.o  obj\/test_hemm.o  obj\/test_sylv.o  obj\/test_apqudutinc.o  obj\/test_hessut.o  obj\/test_chol.o  obj\/test_spdinv.o  obj\/test_gemm.o  obj\/test_apqutinc.o  obj\/test_ldlt2_nopiv_ps.o  obj\/test_syrk.o  obj\/test_common.o  obj\/test_trmm.o ..\/lib\/x86_64-unknown-linux-gnu\/\/libflame.a  -fopenmp  -lm  -L..\/..\/amd-blis\/lib\/LP64 -lblis-mt  -Wl,-rpath,$HOME\/amd\/amd-blis\/lib\/LP64 -o test_libflame.x\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>paolini@dgx:~\/amd\/libflame\/test$ .\/test_libflame.x\n LibFlame version: AOCL-libFLAME 3.2, supports LAPACK 3.10.0\n\n--- test suite parameters ----------------------------\n\nn_repeats            2\nn_storage            1\nstorage              c\nn_datatypes          4\ndatatype&#91;0]          100 (s)\n        &#91;1]          101 (d)\n        &#91;2]          102 (c)\n        &#91;3]          103 (z)\nb_alg_flat           40\nb_alg_hier           10\nb_flash              40\np_first              80\np_max                160\np_inc                40\np_nfact              10\nn_threads            2\nreaction_to_failure  i\n.\n.\n.\n--- Partial \/ Incomplete LDLT(X) factorization without Pivoting ---\n\n   API                             DATA_TYPE     SIZE  FLOPS   TIME(s)       ERROR      STATUS\n   ====                            ==========    ==== =======  ========     ==========  ========\n   SPFFRTX                           s|c          80   4.204  0.0000132250   3.99e-01   PASS for nfact=10\n   SPFFRTX                           s|c         120   5.574  0.0000235350   2.70e-01   PASS for nfact=10\n   SPFFRTX                           s|c         160   5.433  0.0000439540   2.47e-01   PASS for nfact=10\n   SPFFRTX                           d|c          80   2.942  0.0000188960   1.86e-01   PASS for nfact=10\n   SPFFRTX                           d|c         120   3.928  0.0000334030   8.81e-01   PASS for nfact=10\n   SPFFRTX                           d|c         160   4.333  0.0000551150   7.58e-01   PASS for nfact=10\n   SPFFRTX                           c|c          80  11.833  0.0000186650   5.41e-01   PASS for nfact=10\n   SPFFRTX                           c|c         120  10.308  0.0000506860   2.52e-01   PASS for nfact=10\n   SPFFRTX                           c|c         160  11.219  0.0000848610   2.52e-01   PASS for nfact=10\n   SPFFRTX                           z|c          80   6.656  0.0000331830   2.24e-01   PASS for nfact=10\n   SPFFRTX                           z|c         120   7.772  0.0000672270   4.60e-01   PASS for nfact=10\n   SPFFRTX                           z|c         160   8.510  0.0001118720   2.63e-01   PASS for nfact=10\n\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The AMD Optimizing CPU Libraries (AOCL) can be downloaded from AMD Developer Central. AOCL-libFLAME is an AMD optimized portable library<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-490","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages\/490","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/comments?post=490"}],"version-history":[{"count":7,"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages\/490\/revisions"}],"predecessor-version":[{"id":533,"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages\/490\/revisions\/533"}],"wp:attachment":[{"href":"https:\/\/iotlab.sdsu.edu\/index.php\/wp-json\/wp\/v2\/media?parent=490"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}