Xu Han
Xu Han
in ObjDump: 0000000007f01e90 : 7f01e90: f3 0f 1e fa endbr64
original build option from makefile: > gcc -DNDEBUG -DLIBXSMM_NOFORTRAN -DLIBXSMM_TARGET_ARCH=1006 -DLIBXSMM_OPENMP_SIMD -DLIBXSMM_BUILD=2 -Iinclude -I./src -msse4.2 -fPIC -Wall -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -pthread -Werror -c ./src/generator_mateltwise.c -o obj/intel64/generator_mateltwise.o...
1. Add "SLEEF_BUILD_SHARED_LIBS" to instead of CMake reserved variable "BUILD_SHARED_LIBS". 2. Add clear library type to add_library, it will remove the global control from "BUILD_SHARED_LIBS", Link: https://cmake.org/cmake/help/latest/guide/tutorial/Selecting%20Static%20or%20Shared%20Libraries.html Additional, I write...
Fix "SLEEF_BUILD_SCALAR_LIB" lost function to control build sleefscalar lib. |Status|Control Build|Control Install| |----|----|----| |Current Code| No |Yes| |This PR|Yes|Yes
Warning message on MSVC: ```cmd 28>D:\xu_github\sleef\src\libm\sleefsimddp.c(28,9): warning C4068: unknown pragma 'STDC' 28>sleefsimdsp.c 28>D:\xu_github\sleef\src\libm\sleefsimdsp.c(28,9): warning C4068: unknown pragma 'STDC' 28>Generating Code... ``` Fix it by conditional defination for disable fp contractions.
Fixes #119304 1. Add try catch to handle the compiler version check. 2. Retry to query compiler version info. 3. Return False if can't get compiler info twice. cc @jgong5...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #116178 POC link: https://github.com/xuhancn/x86_isa_help Change logs: 1. Use new CppBuilder which also support Windows MSVC. 2. Add cpuid based x86 isa detector,...
Previous full PR https://github.com/pytorch/pytorch/pull/115248 is failed to merge due to fb_code is hard to debug. I also tried to submit them as two pieces, https://github.com/pytorch/pytorch/pull/118514 https://github.com/pytorch/pytorch/pull/118515. And they have passed...
# Summary During our pytorch development, we found Windows system memory alloctor is worse performance, and slow down the whole pytorch performance. After add third party memory alloctor, pytorch improved...
Fixes #ISSUE_NUMBER cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10