ipfs-docs icon indicating copy to clipboard operation
ipfs-docs copied to clipboard

Review how CIDs are explained, and cover common pitfalls.

Open rleddy opened this issue 3 years ago • 1 comments

So, last night as it got later and later I was playing around with creating CIDs and using the cool CID tool, which really only parses a CID if it fits the IPFS CID formats.

I had two concerns. (That is aside from not knowing where my next meal will come from and I am guessing nobody hires experienced programmers.)

  1. I could use a tool that helps me create a CID. Thoughts: Why don't I do this? Next thought: What? For free? (Yet again)

So, it could have pull down menus for the base, the coding, etc. One reason is that the doc sort of brushes over putting in the version number (it does say sort of in passing). It does say where the base encoding starts, but you still kind of have to figure that out.

ENCODE_IN_BASE( binary_concat(<number of bytes produced by the algorithm or bits/8><the bytes (bits) produced by the algorithm>))

For a while I was thinking 0x100 not 0x20 for length.

  1. CIDs are really screwed up. Why?

Because often, the front of the string is always the same. Yet, because of the way encoding is expected, the tail of the front can vary. Should it? I don't think so.

Here is a very simple node.js program:

function do_hash (text,base) {
    const hash = crypto.createHash('sha256');
    hash.update(text);
    let ehash = hash.digest(base);
    return(ehash)
}

Here are two outputs ('hex' and 'base64url'):

hex encoding of a sha256 hash (hex): a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
base64 encoding of a sha256 hash (base64url): pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4

Now, to turn the hex legal, that is easy enough. Just put f01551220a in front of the hash. f01551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e And, you can put that in the CID inspector.

Now, what about base64(and associated)? Reasonable front of string based on converting from hex: uAVUSIA So, you might think that a good representation would be something like: uAVUSIApZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4 OR even uAVUSIA|pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4

But straight IPFS CIDv1 is the following: uAVUSIKWRptQL9CBASgEXM8-3sZDWLGW_C82jK1eyd9mtnxRu

The first characters: uAVUS -- these would be the same for all raw version 1 CIDs using SHA256. But, the length ends up in the SHA hash, basically.

So, the string does not readily represent the hash.

One might say, "All well and good for a CID."

But, really I would like to recover the hash without unscrambling the whole string. The difference between CIDs being the hash anyway.

Also, if I were making a huge number of CIDs, I would want to have a bucket brigade of buffers with the front of the string already in place. Then, all the program has to do is pass a pointer to the end of the front of the string to the encoder, which then uses the buffer as output.

But, the way it is, the program will have to thrash with memory management to make anything other than hex (binary, yes). Less than desired anyway. So, why go slower?

So, why Base64? Well in my recent code codings, I put the base58btc code and fixed up a few other bases in C++. Well, base64url keeps on being fairly simple and fast. I am not sure exactly why there is a love affair with base58. 'Looks cool' is not a good reason when thinking 'data center'. But, maybe there is actually more to it why someone wants to do long division.

So, that's it for CIDs. Might have to make my own CID system. Then, to use IPFS, I'll have to provide a converter and put up warnings for everyone.

rleddy avatar Aug 01 '21 19:08 rleddy

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additiona round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Aug 01 '21 19:08 welcome[bot]