sqlite-zstd icon indicating copy to clipboard operation
sqlite-zstd copied to clipboard

Training dictionary failed: Src size is incorrect

Open leoplusx opened this issue 3 years ago • 3 comments

That's the error I'm getting when running

SELECT zstd_incremental_maintenance(null, 1);

Input:

SELECT zstd_enable_transparent('{"table": "absatz","column": "text", "compression_level": 19, "dict_chooser": "''a''"}');
SELECT zstd_incremental_maintenance(null, 1);

Output:

[2022-08-04T11:36:24Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma auto_vacuum=full;`
[2022-08-04T11:36:24Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma busy_timeout=2000;` or higher
[2022-08-04T11:44:13Z INFO  sqlite_zstd::transparent] absatz.text: Total 36006063 rows (16.46GB) to potentially compress (split in 1 groups).
Error: getting dict

Caused by:
    0: Training dictionary failed
       
       Caused by:
           Src size is incorrect
    1: Error code 1: SQL error or missing database

What causes this and how do I fix it?

Also, how can I change the settings for zstd_enable_transparent afterwards? Running the command again on the same column (even with different settings) gives me Error: Column text is already enabled for compression.

Thanks! 🙏

leoplusx avatar Aug 04 '22 11:08 leoplusx

I've not seen that error before.. Could you set env SQLITE_ZSTD_LOG=debug and run it again to see better what the dictionary training params are? Maybe the target dict size is too large for zstd or something. If you could find a way to send me your file or find a more minimal example that would also help.

Right now it's not easily possible to change the settings. What do you want to change exactly? For many settings you can change the config simply by editing the json in _zstd_config directly, but there's no integrated functionality to tell you if it will work.

phiresky avatar Aug 05 '22 01:08 phiresky

Ah you're probably hitting this error case: https://github.com/facebook/zstd/blob/eadb6c874f9d0c9e90c835f8b0181da802361e4c/lib/dictBuilder/fastcover.c#L328

Where the max training size is 1GB or 4GB. Try setting "train_dict_samples_ratio": 5 in the config json

phiresky avatar Aug 05 '22 01:08 phiresky

I think this worked for me: I used json_set to add the necessary field.

anacrolix avatar Sep 27 '22 03:09 anacrolix