Stockfish icon indicating copy to clipboard operation
Stockfish copied to clipboard

Add some meta-data in the generated nn files

Open vondele opened this issue 5 years ago • 7 comments

Based on a discussion started https://github.com/glinscott/fishtest/issues/732#issuecomment-668523399

It might be an idea to have some meta-data in the data files (nn.bin, or nn-sha.nnue) that give information on the net, and could be parsed by the chess engine / database etc.

Fields could be author: date: description:

This might need some careful thinking, and maybe we can learn from other data formats

vondele avatar Aug 05 '20 10:08 vondele

Thank you for creating the issue. I'm thinking if it is good to add those meta-data at the end of net files. My concerns are if this breaks the loading on old binaries, and if this format is extensible. I will study other data formats.

nodchip avatar Aug 05 '20 10:08 nodchip

concerning loading older nets.. at least based on just the stockfish perspective, I think this is not so much of a problem. This is still 'eary days', so this kind of changes should still be OK (and is one of the reasons why I want to link SF binary, and recommended net so tightly, i.e. allow for this kind of changes). Obviously, if things can be backwards compatible without much overhead, that's worth considering.

vondele avatar Aug 05 '20 10:08 vondele

ok. I will think how we can realize the file format which is backwards compatible.

nodchip avatar Aug 05 '20 11:08 nodchip

I investigated the source code. The current net file reader checks if a stream reached the end of the stream after the network parameters are read. If we add additional data at the end of the file, reading a net file always fails.

A net file contains a version field in the header. We can add meta-data at the end of the file if we introduce a new version number. We will change the reader to switch if we read the meta-data or not according to the version number.

The current file format is:

field type num bytes
version uint32_t 4
hash_value uint32_t 4
size uint32_t 4
architecture char size
feature_transformer - -
network - -

The type and size of the feature_transformer and network are hard coded in the network implementation.

If we add meta-data, the format will be like.

field type num bytes
version uint32_t 4
hash_value uint32_t 4
size uint32_t 4
architecture char size
feature_transformer - -
network - -
meta-data size uint32_t 4
meta-data char meta-data size

And the meta-data will be a text like:

author foo bar
date 2004-04-01T12:00+09:00
description something something.

The implementation of meta-data reader will be like:

std::istringstream iss(metadata_string);
std::map<string, string> metadata;
std::string key;
while (iss >> key) {
  iss.ignore();
  std::string value;
  std::getline(iss, value);
  metadata[key] = value;
}

The current format will be backward compatible. But the suggested format is not forward compatible.

@vondele Could you please review the format, and let me know if there are concerns?

nodchip avatar Aug 05 '20 15:08 nodchip

looks good, but I'm wondering if we should go for a 'free format / free size', or if it is better to introduce some restrictions.

E.g. should we use/require ascii or something else (even with a simple date, we once had problems in fishtest: https://groups.google.com/g/fishcooking/c/SOpgUxPYZa0 )

Should we restrict the value to the first 128/256 characters of the metadata (metadata[key] = value.substr(0,128)) to avoid long/malicious entries?

Probably we could also list the default keys (author, date, description sounds like a good start).

vondele avatar Aug 06 '20 15:08 vondele

E.g. should we use/require ascii or something else (even with a simple date, we once had problems in fishtest: https://groups.google.com/g/fishcooking/c/SOpgUxPYZa0 )

It sounds reasonable. Let's requisre ascii. It could be better to restrict the date and time format to ISO 8601.

Should we restrict the value to the first 128/256 characters of the metadata (metadata[key] = value.substr(0,128)) to avoid long/malicious entries?

It also sounds reasonable. Let's restrict the value to the first 128 characters of the metadata.

Probably we could also list the default keys (author, date, description sounds like a good start).

I agree. "author", "date" and "description" should be the default keys.

I will send a pull request to official Stockfish before I push them to the master branch in my repository to avoid issues that the official Stockfish can not read the new net format.

nodchip avatar Aug 06 '20 23:08 nodchip

The description field is quite flexible and supports longer strings, so I think we can repurpose it for this with a loose format.

Sopel97 avatar Apr 03 '21 14:04 Sopel97