zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Discrepancy in manual for dictionary loading

Open efbicief opened this issue 3 years ago • 1 comments

In zstd_manual.html:872, describing ZSTD_DCtx_loadDictionary:

The dictionary remains valid for all future frames, until explicitly invalidated

However, in ZSTD_CCtx_loadDictionary_advanced (zstd_compress.c:1123 which ZSTD_DCtx_loadDictionary wraps around), this seems not to be true:

size_t ZSTD_CCtx_loadDictionary_advanced(
        ZSTD_CCtx* cctx, const void* dict, size_t dictSize,
        ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictContentType)
{
    RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong,
                    "Can't load a dictionary when ctx is not in init stage.");
    DEBUGLOG(4, "ZSTD_CCtx_loadDictionary_advanced (size: %u)", (U32)dictSize);
    ZSTD_clearAllDicts(cctx);  /* in case one already exists */

It seems that if adictionary is already used within a context, when loading a new one, the previous dictionary gets cleared from the context. I assumed that it would be retained in the context based on the manual, since by loading the new dictionary, I am not explicitly invalidating the old dictionary. Am I correct in thinking this?

efbicief avatar Aug 19 '22 10:08 efbicief

The context can only manage one dictionary. So, when a new dictionary get inserted, the old one get evicted.

There is also a difference between compression and decompression sides, which are quoted in this issue.

On the decompression side, it's possible, and was considered, to manage multiple dictionaries simultaneously. In which case, it would be possible to stack multiple DDict references in the context. The problem is that this capability has not been developed yet.

On the compression side though, this makes less sense, since one dictionary must be selected for a given compression job. Therefore it has always been expected that referencing a new dictionary would evict the previous one.

However, in ZSTD_CCtx_loadDictionary_advanced (zstd_compress.c:1123 which ZSTD_DCtx_loadDictionary wraps around),

This sentence puzzles me. Decompression code should never depend on compression code, and vice versa. Can it be double-checked ?

Cyan4973 avatar Aug 19 '22 15:08 Cyan4973

I'll clarify the docs slightly.

In which case, it would be possible to stack multiple DDict references in the context. The problem is that this capability has not been developed yet.

It is enabled if you set ZSTD_d_refMultipleDDicts on your `dctx.

terrelln avatar Dec 20 '22 01:12 terrelln