nnue-pytorch
nnue-pytorch copied to clipboard
Add a script for finding most important [FT] weights in a net.
This script provides a way to find most important (currently only for the feature transformer) weights in a given network under given dataset. The importance is determined by taking a sum of absolute values of the gradients. Because it is not possible to accumulate absolute values of the gradients over a batch this process needs to be done with batch size of 1, which means it's relatively slow. pos_n
of several 10s or 100s of thousands should be feasible however. This tool is supposed to help with choosing weights for SPSA tuning.
usage: weight_importance.py [-h] [--best_n BEST_N] [--best_pct BEST_PCT]
[--pos_n POS_N] [--layer LAYER] [--output OUTPUT]
[--data DATA] [--features FEATURES]
model
Finds weights with the highest importance. Importance is measured by the
absolute value of the gradient.
positional arguments:
model Source model (can be .ckpt, .pt or .nnue)
optional arguments:
-h, --help show this help message and exit
--best_n BEST_N Get only n most important weights
--best_pct BEST_PCT Get only weights up to a given percent [0, 1] of the
total importance. Whichever of best_n or best_pct is
reached faster.
--pos_n POS_N The number of positions to evaluate.
--layer LAYER The layer to probe. Currently only 'ft' is supported.
--output OUTPUT Optional output file.
--data DATA path to a .bin or .binpack dataset
--features FEATURES The feature set to use. Can be a union of feature
blocks (for example P+HalfKP). "^" denotes a factorized
block. Currently available feature blocks are: HalfKP,
HalfKP^, HalfKA, HalfKA^, HalfKAv2, HalfKAv2^,
HalfKAv2_hm, HalfKAv2_hm^
The produced output can be optionally also saved to a file by using --output
option to provide the path to the file. The output format is {feature_index}\t{output_index}\t{total_grad}
.
Example from a small HalfKAv2_hm-128x2-8-32-1 net:
C:\dev\nnue-pytorch>python weight_importance.py --data=d10_10000.bin --features=
HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn.nnue
Done 100 out of 1024 evaluations...
Done 200 out of 1024 evaluations...
Done 300 out of 1024 evaluations...
Done 400 out of 1024 evaluations...
Done 500 out of 1024 evaluations...
Done 600 out of 1024 evaluations...
Done 700 out of 1024 evaluations...
Done 800 out of 1024 evaluations...
Done 900 out of 1024 evaluations...
Done 1000 out of 1024 evaluations...
22468 81 22.95361328125
21062 1 19.015615463256836
21833 81 16.794130325317383
22468 1 16.653770446777344
22468 19 16.062719345092773
22468 52 15.507932662963867
22468 37 15.23114013671875
22468 75 15.099000930786133
22468 103 14.928569793701172
21062 52 14.71535873413086
21062 19 14.61031436920166
22215 81 14.525751113891602
22468 72 14.388435363769531
22468 110 14.307860374450684
22468 70 14.287771224975586
21062 37 14.226285934448242
21062 81 14.158844947814941
22468 119 13.891007423400879
22468 88 13.868194580078125
21062 103 13.766294479370117
22468 31 13.745142936706543
22208 81 13.659659385681152
21832 81 13.514847755432129
21062 110 13.416680335998535
22468 8 13.376480102539062
22468 44 13.148722648620605
21062 31 13.105257987976074
22328 81 13.007036209106445
21062 75 12.977574348449707
22468 3 12.970292091369629
21837 81 12.898788452148438
21062 8 12.8389310836792
21838 81 12.642690658569336
@Sopel97 Thanks for this! I'll take a closer look later.
@Sopel97 What's the use of KingBuckets[64]? https://github.com/official-stockfish/Stockfish/commit/d61d38586ee35fd4d93445eb547e4af27cc86e6b
@Sopel97 What's the use of KingBuckets[64]? official-stockfish/Stockfish@d61d385
A remnant from a more generic implementation where I was able to assign multiple king squares to one bucket. The king square is ensured to be in e..h files by orient
, and this lookup table provides values for the squares on e..h files to map them to 0..31 (due to a small mistake they are in reverse order, that is e1 has highest bucket, but that's not important). This could be simplified to simple arithmetic, but I don't think there's a need for it.
@Sopel97 I'm having this error after trying to run compile_data_loader.bat
(env) (base) C:\Users\User\nnue-pytorch>compile_data_loader.bat
(env) (base) C:\Users\User\nnue-pytorch>cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./" CMake Error at CMakeLists.txt:3 (project): Running
'nmake' '-?'
failed with:
The system cannot find the file specified
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage -- Configuring incomplete, errors occurred! See also "C:/Users/User/nnue-pytorch/build/CMakeFiles/CMakeOutput.log".
(env) (base) C:\Users\User\nnue-pytorch>cmake --build ./build --config RelWithDebInfo --target install The system cannot find the file specified CMake Error: Generator: execution of make failed. Make command was: nmake -f Makefile /nologo install &&
(env) (base) C:\Users\User\nnue-pytorch>
cmake_minimum_required(VERSION 3.0)
project(training_data_loader)
I have the training_data_loader file
You can try
cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./" -G "MinGW Makefiles"
cmake --build ./build --config RelWithDebInfo --target install
instead to force mingw makefiles. Looks like it tries to use ninja for some reason and that fails
I think some tool like this is interesting. It would be interesting to understand more precisely what 'importance' means in this context. The gradient might not quite be the right quantity to look at, probably more something like the second derivatives (Hessian matrix, or at least its diagonal).
Computing the full hessian with this many parameters might not be feasible, though pytorch has an autograd function that could achieve it in principle. Computing the diagonal of the hessian should be trivial https://stackoverflow.com/a/50375367. I'll revise this later with an option to use nth [configurable] derivative instead.
It appears that gradients returned by our FT's backward doing have grad_fn define (and I have no idea what it should be), so we cannot get a second derivative for it. It should however work for later layers, if we want to support them in the future.
@Sopel97 I downloaded the wrongNNUE binpack and it's fine now. There's just some deprecation warning.
(env) (base) C:\Users\User\nnue-pytorch>python weight_importance.py --data=wrongNNUE_02_d9.binpack --features=HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn-13406b1dcbe0.nnue Done 100 out of 1024 evaluations... Done 200 out of 1024 evaluations... Done 300 out of 1024 evaluations... Done 400 out of 1024 evaluations... Done 500 out of 1024 evaluations... Done 600 out of 1024 evaluations... Done 700 out of 1024 evaluations... Done 800 out of 1024 evaluations... Done 900 out of 1024 evaluations... Done 1000 out of 1024 evaluations... C:\Users\User\nnue-pytorch\env\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.) return torch.floor_divide(self, other) 21062 197 24.06382179260254 18254 197 16.326650619506836 17551 197 14.6910400390625 16149 197 14.099198341369629 18957 197 13.728248596191406 21062 389 12.21109390258789 20430 197 11.612868309020996 19660 197 10.220696449279785 18254 389 10.145904541015625 21765 197 10.132981300354004 21062 897 9.354948043823242 20429 197 9.321372985839844 16852 389 8.915192604064941 18254 897 8.53359317779541 16852 197 8.460387229919434 15446 197 7.808490753173828 20534 197 7.556325435638428 19660 897 7.460641384124756 20527 197 7.434072017669678 14044 389 7.373429298400879 16149 389 7.353665351867676 20526 197 7.340466499328613 11236 609 7.326484203338623 18254 719 7.312647819519043 20438 197 7.240901470184326 16149 897 7.228122711181641 20359 197 7.172454357147217 20533 197 7.096660614013672 20431 197 7.050807476043701 11236 888 6.967772483825684 21765 389 6.918582439422607 20439 197 6.909554958343506 17551 897 6.840373516082764
(env) (base) C:\Users\User\nnue-pytorch>
@Sopel97 What is {feature_index}
and t{output_index}
?
So from the results above, is this the most important weight?
int(Stockfish::Eval::NNUE::featureTransformer->psqtWeights[21062]);
int psqtW[21062] = {36043};
@Sopel97 What is
{feature_index}
andt{output_index}
? So from the results above, is this the most important weight?int(Stockfish::Eval::NNUE::featureTransformer->psqtWeights[21062]);
int psqtW[21062] = {36043};
The first layer is of shape (32*64*11, 1024)
. feature_index
is the index in the first dimension, output_index
in the second dimension. The PSQT weights are excluded, as they have uncomparable gradients to the rest of the feature transformer.
The most important weight is featureTransformer->weights[21062 * TransformedFeatureDimensions + 197]
@Sopel97 Thanks. Now it's clear to me.
@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0? I mean, our usual Score
for each square, accumulated for each pawn on the board and added to the NNUE result. It could in principle tell us whether the net can learn good PSQT values from just training data. I was considering that recently, but I have no idea what spsa parameters would be suitable when starting from 0.
@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0?
I have not tried something like that.