rizin icon indicating copy to clipboard operation
rizin copied to clipboard

Complete each library README.md

Open wargio opened this issue 7 months ago • 10 comments

The following readme are empty and instead should contain a description of each library

  • [ ] librz/arch/README.md (missing file which should be created)
  • [ ] librz/bin/README.md
  • [ ] librz/bp/README.md
  • [ ] librz/config/README.md
  • [ ] librz/cons/README.md
  • [ ] librz/core/README.md
  • [ ] librz/crypto/README.md
  • [ ] librz/debug/README.md
  • [ ] librz/demangler/README.md
  • [ ] librz/diff/README.md
  • [ ] librz/egg/README.md
  • [ ] librz/flag/README.md
  • [ ] librz/hash/README.md
  • [ ] librz/il/README.md
  • [ ] librz/lang/README.md
  • [ ] librz/magic/README.md
  • [ ] librz/main/README.md
  • [ ] librz/reg/README.md
  • [ ] librz/search/README.md
  • [ ] librz/sign/README.md
  • [ ] librz/socket/README.md
  • [ ] librz/syscall/README.md
  • [ ] librz/type/README.md

wargio avatar May 06 '25 10:05 wargio

Great issue – this would be very helpful!

PeiweiHu avatar May 08 '25 16:05 PeiweiHu

Defining what sections we want in each README.md would be great.

Right now there is a general section and then a ## What can I expect here?

Maijin avatar May 22 '25 03:05 Maijin

Personally I'd like to see architectural details about the module. What it does, some examples how it is used etc. Problem is, that this is actually something people who worked on it need to do.

Rot127 avatar May 22 '25 11:05 Rot127

I think if we can work couple examples - we could generate it, at least partially, and then authors can refine it as needed. For example for the Crypto one, what do you think of the following? Do you see something we need to add for this for example and what?


RzHash

The Rizin hashing library, RzHash, offers a unified interface to various hashing algorithms, allowing other modules and users to easily compute and work with hash digests.

RzHash encapsulates the context for a specific hashing operation, configured for a chosen algorithm (e.g., MD5, SHA256). Once initialized, data can be fed into the hash context incrementally. Finally, the computed hash digest can be retrieved. This design allows for efficient hashing of large data sets or streaming data without requiring the entire content to be in memory at once.

What can I expect here?

  • Support for multiple hashing algorithms
  • A primary context RzHash for managing hashing operations.
  • Core functions for the hashing lifecycle:
    • rz_hash_new(): To initialize a hash context with a specific algorithm.
    • rz_hash_update(): To feed data to the hash context.
    • rz_hash_final(): To compute the final hash digest.
    • rz_hash_free(): To release the hash context.
  • Plugin-based architecture for extending supported hash algorithms.
  • Helper functions for one-shot hashing of common algorithms like XXH32, entropy, and ssdeep.

Architecture

The RzHash library employs a plugin-based architecture to manage and provide various hashing algorithms.

  1. RzHash Context: The central element is the RzHash structure, which acts as a primary context for hashing operations. When you want to compute a hash, you typically start by creating and initializing an RzHash context for a specific algorithm using rz_hash_new(algorithm_flags, ...) or by using the more advanced RzHashCfg for multi-hash configurations.

  2. Algorithm Plugins (RzHashPlugin): Each hashing algorithm (like MD5, SHA256, CRC32, etc.) is represented by an RzHashPlugin. This structure defines a common interface for all algorithms, including functions to:

    • Create and free an algorithm-specific internal context.
    • Initialize the hashing process.
    • Update the hash with new data.
    • Finalize the computation and retrieve the digest.
    • Report digest size and block size. Plugins are registered with the main RzHash object, often at startup (for static plugins) or dynamically. The files in librz/hash/p/ (e.g., algo_md5.c, algo_sha256.c) are examples of such plugins. These plugins might wrap internal implementations (found in librz/hash/algorithms/) or utilize external libraries like OpenSSL.
  3. Core Workflow: The general process for hashing data using the library (focusing on the simpler direct RzHash usage) is:

    • Initialization: Create an RzHash context by specifying the desired algorithm (e.g., RZ_HASH_ALG_MD5, RZ_HASH_ALG_SHA256) using rz_hash_new(). This sets up the context for that particular algorithm.
    • Update: Feed the data to be hashed into the context using rz_hash_update(). This can be done in one go or in multiple chunks.
    • Finalization: Call rz_hash_final() to complete the hash computation. This function will populate a buffer with the resulting hash digest.
    • Cleanup: Release the resources used by the RzHash context by calling rz_hash_free().
  4. Advanced Configuration (RzHashCfg): For more complex scenarios, such as calculating multiple hashes simultaneously or using HMAC, the library provides RzHashCfg. This involves:

    • Creating a general RzHash object (rz_hash_new(), the one that takes no arguments).
    • Creating an RzHashCfg from the RzHash object.
    • Configuring one or more algorithms for this RzHashCfg (e.g., rz_hash_cfg_configure(), rz_hash_cfg_new_with_algo()).
    • Initializing, updating, and finalizing via rz_hash_cfg_init(), rz_hash_cfg_update(), and rz_hash_cfg_final().
    • Retrieving results using rz_hash_cfg_get_result() or rz_hash_cfg_get_result_string().

Usage and Examples

Example: Calculating an MD5 Hash

This example shows how to calculate the MD5 hash of the string "Hello, world!".

#include <rz_hash.h>
#include <stdio.h>
#include <string.h>

int main_md5_example(void) {
  RzHash *ctx;
  ut8 hash[RZ_HASH_SIZE_MD5]; // RZ_HASH_SIZE_MD5 is typically 16 bytes
  const char *data = "Hello, world!";
  int i;

  // Initialize the hash context for MD5
  // Note: RZ_HASH_ALG_MD5 is a symbolic constant representing the MD5 algorithm choice.
  // The second argument to rz_hash_new can be flags, often 0 for default behavior.
  ctx = rz_hash_new(RZ_HASH_ALG_MD5, 0);
  if (!ctx) {
    fprintf(stderr, "Failed to initialize MD5 hash context.\n");
    return 1;
  }

  // Update the hash context with the data
  rz_hash_update(ctx, (const ut8 *)data, strlen(data));

  // Finalize the hash computation and get the digest
  rz_hash_final(ctx, hash);

  // Print the hash digest
  printf("MD5(\"%s\") = ", data);
  for (i = 0; i < RZ_HASH_SIZE_MD5; i++) {
    printf("%02x", hash[i]);
  }
  printf("\n");

  // Free the hash context
  rz_hash_free(ctx);
  return 0;
}

Maijin avatar May 26 '25 06:05 Maijin

The Rizin hashing library, RzHash, offers a unified interface to various hashing algorithms, allowing other modules and users to easily compute and work with hash digests.

RzHash encapsulates the context for a specific hashing operation, configured for a chosen algorithm (e.g., MD5, SHA256). Once initialized, data can be fed into the hash context incrementally. Finally, the computed hash digest can be retrieved. This design allows for efficient hashing of large data sets or streaming data without requiring the entire content to be in memory at once.

Here I would make the first part of the sentence RzHash encapsulates the context for a specific hashing operation,... more simple. So it is easier to understand for none native speakers.

Otherwise I like it and would keep it.

What can I expect here?

Also ok. But people should be pointed to the header file to see what hash functions are available etc.

Architecture

IMHO text is not the right way to communicate the architecture. Better have a flow chart with short descriptions. Otherwise people won't read it (I suspect). Because it is just hard to follow since there are many moving parts. But text is one dimensional.

Meaning: Point 1 can stay in written form. Point two should be represented as a diagram/flow chart whatever makes it comprehensible quickly.

Point 3 & 4 are examples. They should be in code as below.

Code example

Good one! But remove unnecessary lines. E.g. the if (!ctx) check can be replaced with a comment // Check ctx != NULL. Also the printing of the hash should be more simple. The example should be minimal so people are directly focused at the functions which matter (rz_hash_new, rz_hash_update etc.).

Also, remove any comments pointing out specifics about the MD5 hashing. It is a general example how to use the plugins. Not an example how to calculate MD5 hashes.

Rot127 avatar May 27 '25 13:05 Rot127

i'm ok with the example, but we have to be careful on some stuff, like the usage of io/core bindings

wargio avatar May 27 '25 13:05 wargio

I've updated - I'm not sure about the diagram though - if you have some ideas @Rot127.

RzHash

The Rizin hashing library, RzHash, offers a unified interface to various hashing algorithms, allowing other modules and users to easily compute and work with hash digests.

The RzHash structure holds all information needed for a hashing operation with a specific algorithm. Once initialized, data can be fed into the hash context incrementally. Finally, the computed hash digest can be retrieved. This design allows for efficient hashing of large data sets or streaming data without requiring the entire content to be in memory at once.

What can I expect here?

  • For a comprehensive list of supported hash algorithms and their corresponding flags, please refer to the rz_hash.h header file.
  • A primary context RzHash for managing hashing operations.
  • Core functions for the hashing lifecycle:
    • rz_hash_new(): To initialize a hash context with a specific algorithm.
    • rz_hash_update(): To feed data to the hash context.
    • rz_hash_final(): To compute the final hash digest.
    • rz_hash_free(): To release the hash context.
  • Plugin-based architecture for extending supported hash algorithms.
  • Helper functions for one-shot hashing of common algorithms like XXH32, entropy, and ssdeep.

Architecture

The RzHash library employs a plugin-based architecture to manage and provide various hashing algorithms.

  • RzHash Context: The central element is the RzHash structure, which acts as a primary context for hashing operations. When you want to compute a hash, you typically start by creating and initializing an RzHash context for a specific algorithm using rz_hash_new(algorithm_flags, ...) or by using the more advanced RzHashCfg for multi-hash configurations.

  • RzHash Core Workflow

graph TD
    subgraph RzHash Core Workflow
        C[Initialize Context - rz_hash_new];
        C --> D{Feed Data Chunks};
        D --> E[Update Hash - rz_hash_update];
        E --> D;
        D --> F[Finalize Computation - rz_hash_final];
        F --> G[Get Hash Digest];
        G --> H[Cleanup - rz_hash_free];
    end

Usage and Examples

Example: Calculating an MD5 Hash

#include <rz_hash.h>
#include <stdio.h>
#include <string.h>

int main_md5_example(void) {
  RzHash *ctx;
  ut8 hash[RZ_HASH_SIZE_MD5];
  const char *data = "Hello, world!";
  int i;

  // Initialize the hash context
  // The second argument to rz_hash_new can be flags, often 0 for default behavior.
  // Check ctx != NULL
  ctx = rz_hash_new(RZ_HASH_ALG_MD5, 0);

  // Update the hash context with the data
  rz_hash_update(ctx, (const ut8 *)data, strlen(data));

  // Finalize the hash computation and get the digest
  rz_hash_final(ctx, hash);

  // Print the hash digest
  for (i = 0; i < RZ_HASH_SIZE_MD5; i++) {
    printf("%02x", hash[i]);
  }

  // Free the hash context
  rz_hash_free(ctx);
  return 0;
}

Maijin avatar May 29 '25 13:05 Maijin

Looks perfect to me! Thanks!

Rot127 avatar May 29 '25 15:05 Rot127

Maybe we could add the example code in a sub-directory in <repo-root>/examples/api/hash/md5.c and compile it if a certain flag is given?

This would ensure our examples always build, we have additional integration tests and people can copy paste it. wdyt?

Rot127 avatar May 29 '25 15:05 Rot127

@Maijin maybe start opening a PR.

wargio avatar May 29 '25 19:05 wargio