Training dictionary failed: Src size is incorrect
That's the error I'm getting when running
SELECT zstd_incremental_maintenance(null, 1);
Input:
SELECT zstd_enable_transparent('{"table": "absatz","column": "text", "compression_level": 19, "dict_chooser": "''a''"}');
SELECT zstd_incremental_maintenance(null, 1);
Output:
[2022-08-04T11:36:24Z WARN sqlite_zstd::transparent] Warning: It is recommended to set `pragma auto_vacuum=full;`
[2022-08-04T11:36:24Z WARN sqlite_zstd::transparent] Warning: It is recommended to set `pragma busy_timeout=2000;` or higher
[2022-08-04T11:44:13Z INFO sqlite_zstd::transparent] absatz.text: Total 36006063 rows (16.46GB) to potentially compress (split in 1 groups).
Error: getting dict
Caused by:
0: Training dictionary failed
Caused by:
Src size is incorrect
1: Error code 1: SQL error or missing database
What causes this and how do I fix it?
Also, how can I change the settings for zstd_enable_transparent afterwards? Running the command again on the same column (even with different settings) gives me Error: Column text is already enabled for compression.
Thanks! 🙏
I've not seen that error before.. Could you set env SQLITE_ZSTD_LOG=debug and run it again to see better what the dictionary training params are? Maybe the target dict size is too large for zstd or something. If you could find a way to send me your file or find a more minimal example that would also help.
Right now it's not easily possible to change the settings. What do you want to change exactly? For many settings you can change the config simply by editing the json in _zstd_config directly, but there's no integrated functionality to tell you if it will work.
Ah you're probably hitting this error case: https://github.com/facebook/zstd/blob/eadb6c874f9d0c9e90c835f8b0181da802361e4c/lib/dictBuilder/fastcover.c#L328
Where the max training size is 1GB or 4GB. Try setting "train_dict_samples_ratio": 5 in the config json
I think this worked for me: I used json_set to add the necessary field.