zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Sample data merging

Open 15596858998 opened this issue 3 years ago • 3 comments

Can we merge the sample data when training the dictionary in a buffer. Because there will be a lot of sample data that repeats a certain data, but this data only appears once in this sample, but we can't find this data to use as a dictionary well now.

15596858998 avatar Nov 26 '21 09:11 15596858998

Because there will be a lot of sample data that repeats a certain data, but this data only appears once in this sample

This is the intended use case for the dictionary trainer. The trainer will actually ONLY look at repetitions between different samples, and ignores repetition within a single sample.

If you are finding that the dictionary isn't working well, please share more details:

  1. Include the training command, the number of samples, and the approximate size of the samples. With this, we may be able to help diagnose your problem.
  2. If possible, include the data. If you can provide the samples you are training on, we can certainly help.

terrelln avatar Dec 01 '21 19:12 terrelln

In fact, I want to achieve the function of merging all sample data together, not trimming and aligning each data, dividing all the data into small blocks.

15596858998 avatar Dec 10 '21 07:12 15596858998

Can you explain exactly what you want to do and why (an example may help)? I don't think that is what you want, but I can't be sure because I don't 100% understand the problem.

terrelln avatar Dec 10 '21 18:12 terrelln

Closing due to lack of activity. Please open a new issue if you have further questions.

terrelln avatar Dec 16 '22 00:12 terrelln