zstd icon indicating copy to clipboard operation
zstd copied to clipboard

[Not a bug] Dictionary building strategy

Open sreendra opened this issue 1 year ago • 6 comments

Describe the bug A clear and concise description of what the bug is. Hi @Cyan4973 , Sorry for reporting this as a bug as i don't have a way to reach out to you/team to get this info

I would like to digest a dictionary which i want to use it for all the users. My Blob which i want to compress varies for documents, folders, within documents - pdfs/docx/ppt/jpeg/video etc there are multiple types ... 2. It is a key, value pair object but not a json object 3. Within a given type, content of blob varies from document to document i..e., if i have 2 photos (photo1 and photo2), content for photo1 varies from content for photo2...there is a high chance that the keys may be same ...at times, keys also might slightly differ

With this set up,

  1. if i want to train a dictionary(using the API - ZDICT_trainFromBuffer ), is it enough to choose 1 doc from each type or do i have to run it on multiple files of the same type.

  2. I tried to train using some 700 samples data whose size came around to be ~12MB(sum of samples size)...should i pass dictBufferCapacity to be 12MB or send default value of 110 KB.?

Thank you

To Reproduce Steps to reproduce the behavior:

  1. Downloads data '...'
  2. Run '...' with flags '...'
  3. Scroll up on the log to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots and charts If applicable, add screenshots and charts to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Mac]
  • Version [e.g. 22]
  • Compiler [e.g. gcc]
  • Flags [e.g. O2]
  • Other relevant hardware specs [e.g. Dual-core]
  • Build system [e.g. Makefile]

Additional context Add any other context about the problem here.

sreendra avatar Jun 19 '24 17:06 sreendra