nnue-pytorch Add a script for finding most important [FT] weights in a net.

This script provides a way to find most important (currently only for the feature transformer) weights in a given network under given dataset. The importance is determined by taking a sum of absolute values of the gradients. Because it is not possible to accumulate absolute values of the gradients over a batch this process needs to be done with batch size of 1, which means it's relatively slow. pos_n of several 10s or 100s of thousands should be feasible however. This tool is supposed to help with choosing weights for SPSA tuning.

usage: weight_importance.py [-h] [--best_n BEST_N] [--best_pct BEST_PCT]
                            [--pos_n POS_N] [--layer LAYER] [--output OUTPUT]
                            [--data DATA] [--features FEATURES]
                            model

Finds weights with the highest importance. Importance is measured by the
absolute value of the gradient.

positional arguments:
  model                Source model (can be .ckpt, .pt or .nnue)

optional arguments:
  -h, --help           show this help message and exit
  --best_n BEST_N      Get only n most important weights
  --best_pct BEST_PCT  Get only weights up to a given percent [0, 1] of the
                       total importance. Whichever of best_n or best_pct is
                       reached faster.
  --pos_n POS_N        The number of positions to evaluate.
  --layer LAYER        The layer to probe. Currently only 'ft' is supported.
  --output OUTPUT      Optional output file.
  --data DATA          path to a .bin or .binpack dataset
  --features FEATURES  The feature set to use. Can be a union of feature
                       blocks (for example P+HalfKP). "^" denotes a factorized
                       block. Currently available feature blocks are: HalfKP,
                       HalfKP^, HalfKA, HalfKA^, HalfKAv2, HalfKAv2^,
                       HalfKAv2_hm, HalfKAv2_hm^

The produced output can be optionally also saved to a file by using --output option to provide the path to the file. The output format is {feature_index}\t{output_index}\t{total_grad}.

Example from a small HalfKAv2_hm-128x2-8-32-1 net:

C:\dev\nnue-pytorch>python weight_importance.py --data=d10_10000.bin --features=
HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn.nnue
Done 100 out of 1024 evaluations...
Done 200 out of 1024 evaluations...
Done 300 out of 1024 evaluations...
Done 400 out of 1024 evaluations...
Done 500 out of 1024 evaluations...
Done 600 out of 1024 evaluations...
Done 700 out of 1024 evaluations...
Done 800 out of 1024 evaluations...
Done 900 out of 1024 evaluations...
Done 1000 out of 1024 evaluations...
22468   81      22.95361328125
21062   1       19.015615463256836
21833   81      16.794130325317383
22468   1       16.653770446777344
22468   19      16.062719345092773
22468   52      15.507932662963867
22468   37      15.23114013671875
22468   75      15.099000930786133
22468   103     14.928569793701172
21062   52      14.71535873413086
21062   19      14.61031436920166
22215   81      14.525751113891602
22468   72      14.388435363769531
22468   110     14.307860374450684
22468   70      14.287771224975586
21062   37      14.226285934448242
21062   81      14.158844947814941
22468   119     13.891007423400879
22468   88      13.868194580078125
21062   103     13.766294479370117
22468   31      13.745142936706543
22208   81      13.659659385681152
21832   81      13.514847755432129
21062   110     13.416680335998535
22468   8       13.376480102539062
22468   44      13.148722648620605
21062   31      13.105257987976074
22328   81      13.007036209106445
21062   75      12.977574348449707
22468   3       12.970292091369629
21837   81      12.898788452148438
21062   8       12.8389310836792
21838   81      12.642690658569336

Sep 03 '21 11:09 Sopel97

@Sopel97 Thanks for this! I'll take a closer look later.

Sep 06 '21 12:09 SFisGOD

@Sopel97 What's the use of KingBuckets[64]? https://github.com/official-stockfish/Stockfish/commit/d61d38586ee35fd4d93445eb547e4af27cc86e6b

Sep 07 '21 09:09 SFisGOD

@Sopel97 What's the use of KingBuckets[64]? official-stockfish/Stockfish@d61d385

A remnant from a more generic implementation where I was able to assign multiple king squares to one bucket. The king square is ensured to be in e..h files by orient, and this lookup table provides values for the squares on e..h files to map them to 0..31 (due to a small mistake they are in reverse order, that is e1 has highest bucket, but that's not important). This could be simplified to simple arithmetic, but I don't think there's a need for it.

Sep 07 '21 09:09 Sopel97

@Sopel97 I'm having this error after trying to run compile_data_loader.bat

(env) (base) C:\Users\User\nnue-pytorch>compile_data_loader.bat

(env) (base) C:\Users\User\nnue-pytorch>cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./" CMake Error at CMakeLists.txt:3 (project): Running

'nmake' '-?'

failed with:

The system cannot find the file specified

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage -- Configuring incomplete, errors occurred! See also "C:/Users/User/nnue-pytorch/build/CMakeFiles/CMakeOutput.log".

(env) (base) C:\Users\User\nnue-pytorch>cmake --build ./build --config RelWithDebInfo --target install The system cannot find the file specified CMake Error: Generator: execution of make failed. Make command was: nmake -f Makefile /nologo install &&

(env) (base) C:\Users\User\nnue-pytorch>

Sep 10 '21 19:09 SFisGOD

cmake_minimum_required(VERSION 3.0)

project(training_data_loader)

I have the training_data_loader file

Sep 10 '21 19:09 SFisGOD

You can try

cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./" -G "MinGW Makefiles"
cmake --build ./build --config RelWithDebInfo --target install

instead to force mingw makefiles. Looks like it tries to use ninja for some reason and that fails

Sep 11 '21 21:09 Sopel97

I think some tool like this is interesting. It would be interesting to understand more precisely what 'importance' means in this context. The gradient might not quite be the right quantity to look at, probably more something like the second derivatives (Hessian matrix, or at least its diagonal).

Sep 13 '21 06:09 vondele

Computing the full hessian with this many parameters might not be feasible, though pytorch has an autograd function that could achieve it in principle. Computing the diagonal of the hessian should be trivial https://stackoverflow.com/a/50375367. I'll revise this later with an option to use nth [configurable] derivative instead.

Sep 13 '21 07:09 Sopel97

It appears that gradients returned by our FT's backward doing have grad_fn define (and I have no idea what it should be), so we cannot get a second derivative for it. It should however work for later layers, if we want to support them in the future.

Sep 13 '21 10:09 Sopel97

@Sopel97 I downloaded the wrongNNUE binpack and it's fine now. There's just some deprecation warning.

(env) (base) C:\Users\User\nnue-pytorch>python weight_importance.py --data=wrongNNUE_02_d9.binpack --features=HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn-13406b1dcbe0.nnue Done 100 out of 1024 evaluations... Done 200 out of 1024 evaluations... Done 300 out of 1024 evaluations... Done 400 out of 1024 evaluations... Done 500 out of 1024 evaluations... Done 600 out of 1024 evaluations... Done 700 out of 1024 evaluations... Done 800 out of 1024 evaluations... Done 900 out of 1024 evaluations... Done 1000 out of 1024 evaluations... C:\Users\User\nnue-pytorch\env\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.) return torch.floor_divide(self, other) 21062 197 24.06382179260254 18254 197 16.326650619506836 17551 197 14.6910400390625 16149 197 14.099198341369629 18957 197 13.728248596191406 21062 389 12.21109390258789 20430 197 11.612868309020996 19660 197 10.220696449279785 18254 389 10.145904541015625 21765 197 10.132981300354004 21062 897 9.354948043823242 20429 197 9.321372985839844 16852 389 8.915192604064941 18254 897 8.53359317779541 16852 197 8.460387229919434 15446 197 7.808490753173828 20534 197 7.556325435638428 19660 897 7.460641384124756 20527 197 7.434072017669678 14044 389 7.373429298400879 16149 389 7.353665351867676 20526 197 7.340466499328613 11236 609 7.326484203338623 18254 719 7.312647819519043 20438 197 7.240901470184326 16149 897 7.228122711181641 20359 197 7.172454357147217 20533 197 7.096660614013672 20431 197 7.050807476043701 11236 888 6.967772483825684 21765 389 6.918582439422607 20439 197 6.909554958343506 17551 897 6.840373516082764

(env) (base) C:\Users\User\nnue-pytorch>

Sep 13 '21 19:09 SFisGOD

@Sopel97 What is {feature_index} and t{output_index} ? So from the results above, is this the most important weight? int(Stockfish::Eval::NNUE::featureTransformer->psqtWeights[21062]); int psqtW[21062] = {36043};

Sep 13 '21 20:09 SFisGOD

@Sopel97 What is {feature_index} and t{output_index} ? So from the results above, is this the most important weight? int(Stockfish::Eval::NNUE::featureTransformer->psqtWeights[21062]); int psqtW[21062] = {36043};

The first layer is of shape (32*64*11, 1024). feature_index is the index in the first dimension, output_index in the second dimension. The PSQT weights are excluded, as they have uncomparable gradients to the rest of the feature transformer.

The most important weight is featureTransformer->weights[21062 * TransformedFeatureDimensions + 197]

Sep 13 '21 22:09 Sopel97

@Sopel97 Thanks. Now it's clear to me.

Sep 14 '21 01:09 SFisGOD

@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0? I mean, our usual Score for each square, accumulated for each pawn on the board and added to the NNUE result. It could in principle tell us whether the net can learn good PSQT values from just training data. I was considering that recently, but I have no idea what spsa parameters would be suitable when starting from 0.

Sep 14 '21 12:09 Sopel97

@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0?

I have not tried something like that.

Sep 16 '21 15:09 SFisGOD

nnue-pytorch nnue-pytorch copied to clipboard

Add a script for finding most important [FT] weights in a net.

nnue-pytorch
nnue-pytorch copied to clipboard