js-libp2p
js-libp2p copied to clipboard
PeerId string identifier
Type:
Question
Description:
Currently, js-libp2p uses PeerId objects to identify peers, which are compared using equals method and printed with toB58String. In many cases, the b58 string is used to index peers (eg: in Maps, Objects, Sets, etc) and there, implicitly used for id equality.
Has any thought been given to using b58 string encoded peer-ids as the canonical peer identifier throught the codebase? In this case, the keystore would be more heavily relied on to retrieve public/private keys since they wouldn't be (optionally) attached as in PeerId objects. In many cases, checking the validity of a string identifier would still be required, but strings are convenient to use for indexing and equality checking. Perhaps this may also result in general speedups and reduced memory usage, as string equality checking is generally faster than Buffer equality checking and public keys can all be stored in one place.
I definitely haven't thought thru the intricacies, just more an idea after seeing lots of id.toB58String() in my own code, and seeing that peer.ID is also a string in go-libp2p.
Curious more than anything if this is being considered or on the roadmap, and if this has been discussed before.
This seems a good idea! Totally in favour of moving on a direction of using a string identifier and rely on the PeerStore.
We plan to move away from the id.toB58String() in favour of id.toString(). When we work on this change, I think we should definitely consider this suggestion.
I noticed, profiling lodestar, that requesting peers from the peer store (calling libp2p.peerStore.peers) is not very performant, taking roughly 500ms per call (with roughly 50 or so peers).
It appears to mainly be a result of creating PeerId objects.

I'd expect that resolving this issue in the direction of using string peer-ids directly will result in a significant performance improvement.
Thanks for the data @wemeetagain
So, once we use peerIdStr everywhere we will be able to return also the string in the peerStore.peers.
However, we should look into what is causing this performance degradation. The bottleneck should probably be in https://github.com/libp2p/js-peer-id/blob/master/src/index.js#L27 creating the b58string which we already have, or creating the cid in https://github.com/libp2p/js-peer-id/blob/master/src/index.js#L248 . Both have sync encodings/decodings.
@jacobheun do you think we can get rid of b58string usage by default in all libp2p modules for 0.31?
However, we should look into what is causing this performance degradation.
Yes, we need to get more benchmarks in place and run those regularly (not necessarily as PRs, perhaps daily crons). The fact that this is reassembling the data from the individual books each time is problematic, and the code is doing a lot of iteration on the data to reassemble it.
@jacobheun do you think we can get rid of b58string usage by default in all libp2p modules for 0.31?
@vasco-santos yes, we can internally change or base (likely base36 based on subdomain usefulness) and just start using toString(). If applications need to b58 for some reason, they can do the conversion before printing.
See this CPU profile showing the bottleneck in performance.


After refactoring peer managment in Lodestar I strongly believe moving from a class PeerId to a string as canonical identifier should be a priority. It would simplify the codebase significantly and improve performance omitting costly serialization and deserialization.
2022-06-28 conversation: in general, it has been intentional for js-libp2p to not just strings to avoid ambiguity around peerIds, multiaddrs, etc. There have been improvements in recent js-libp2p releases. Another look at the Lodestar performance impacts will be taken once https://github.com/ChainSafe/lodestar/pull/4114 is addressed.
I am not sure if related, but if I have code such as the following:
const node = await createLibp2p({
// ...
})
// ...
client.publish('peers', { id: node.peerId, addrs: listenAddrs })
const peers = []
const peerTopic = await client.subscribe('peers')
for (let i = 0; i < runenv.testInstanceCount; i++) {
const result = await peerTopic.wait.next()
peers.push(result.value)
}
peerTopic.cancel()
I can only compare the peerIds (to make sure I do not dial to my own node for example) by using == rather then the tripple variant (===):
peers[0].id == node.peerId // true
peers[0].id === node.peerId // false
I suppose because I might need to convert it back to a type from a string, when receiving it from over the network?
Related to the concerns about avoiding ambiguity around peer ids, multiaddrs, other string types:
Some discussion around "nominal types" that provides some insights: https://github.com/microsoft/TypeScript/issues/202
We may use "branded types" to differentiate between various string types and functionally achieve nominal types. As an example, see https://github.com/ChainSafe/ts-peer-id/blob/master/src/index.ts It achieves code that works like so:
import {PeerIdString, validateString} from '@chainsafe/ts-peer-id'
const str = '...'
function doWithPeerId(id: PeerIdString): void {...}
doWithPeerId(str) // type error, `string` doesn't satisfy `PeerIdString`
validateString(str) // asserts that the `str` is of type `PeerIdString`
doWithPeerId(str) // now there's no type error
const str2 = '...' as PeerIdString // alternatively, a string can be explicitly cast to `PeerIdString`
I'm doing some memory analysis of processes under heavy load, I think this might be worth experimenting with - and should be extended to multiaddrs and CIDs too. We keep a lot of Uint8Arrays in memory which are famously inefficient - JS heap size is under control but the overall RSS of the process increases until it dies. Storing these types as strings should help here, though CPU usage may go up when we need to operate on these types & it'll be a breaking change.
One thing we could do that might be a better middle ground, is to have the PeerId/Multiaddr objects store strings internally and not Uint8Arrays - at the moment we store the PeerId multihash (which contains a byte array) and the Multiaddr as a string/byte array/tuples/string tuples.
This would be a non-breaking change and should be lighter on memory use as we'd not be storing Uint8Arrays in long-lived objects. It still gives us type safety and is a lot less disruptive than changing everything to strings.