CAIDA's AS relationship
Import CAIDA AS relationship data, it should be very similar to bgpkit as2rel crawler.
The data is available here: ~~https://publicdata.caida.org/datasets/as-relationships/serial-2/~~ https://publicdata.caida.org/datasets/as-relationships/serial-1/
According to the link you've provided it states that,
"The as-rel files contain p2p and p2c relationships.
The format is:
<provider-as>|<customer-as>|-1
<peer-as>|<peer-as>|0|<source>"
But I was able to see only the data in the latter format (p2c) in the latest .txt file (20230801.as-rel2.txt.bz2),
1|5467|0|bgp
It'll be better if you clarify it @romain-fontugne.
Thanks @roopeshsn for looking at that. I just checked the latest file (20230801.as-rel2.txt.bz2) and the first few lines (after the long comments) seems OK to me:
1|5467|0|bgp
1|8641|0|bgp
1|50377|-1|bgp
1|51705|0|bgp
1|51728|0|bgp
1|59572|0|bgp
2|3999|-1|bgp
I think the README is wrong the format is
<provider-as>|<customer-as>|-1|<source>
<peer-as>|<peer-as>|0|<source>
I will report that to CAIDA, thanks!
I got back from CAIDA, we should use data in https://publicdata.caida.org/datasets/as-relationships/serial-1/ (not serial-2)
These are the blockages right now,
- I need to process only the file named 20230801.as-rel.txt.bz2 right?
- In README it is mentioned that the relationship will be of two types,
<provider-as>|<customer-as>|-1and<peer-as>|<peer-as>|0. So the relationship will look like(:AS {asn: xxxx})-[:PEERS_WITH {rel: -1}]-(:AS {asn: xxxx})and(:AS {asn: xxxx})-[:PEERS_WITH {rel: 0}]-(:AS {asn: xxxx})right?
Ideally the crawler should fetch the latest version of the *.as-rel.txt.bz2 file, yes.
Your relationships are correct, although you will have to assign a direction when creating them, but this can be arbitrary as we always fetch them without direction.
I wonder if we should normalize the rel format with BGPKIT, since they use rel: 1 for customer-provider relationships instead of rel: -1. Any thoughts @romain-fontugne?
Actually, now I think we should not change the source data, because if you then compare with the corresponding README, it gets confusing. I propose leaving the rel property as-is for now and maybe create a new "parallel", but directed, relationship for the customer-provider case at some point (not now).
Yes, I think we can keep the data as it is. But note that this data contains directed links, for the provider-customer relationships the direction is important.
Is it though if we add it as a PEERS_WITH relationship? As far as I am aware we always match these without direction (and it is also not intuitive on which end of a directed PEERS_WITH relationship the provider and on which end the customer should be).
Anyways, to be consistent with the BGPKIT crawler, you can parse the lines in their current order, the direction in case of a provider-customer relationship is:
(Provider:AS)-[:PEERS_WITH {rel: -1}]->(Customer:AS)
There is at least one example where we use the PEERS_WITH direction:
https://github.com/InternetHealthReport/internet-yellow-pages/blob/main/documentation/iij.md#iijs-main-competitors
yes, anyways, let's just be consistent with whatever we are doing with BGPKIT crawler