Facebook's new Rotating IDs break replayWeb.page
ReplayWeb.page cannot replay facebook posts that use their new Rotating ids scheme (https://about.fb.com/news/2022/09/deterring-scraping-by-protecting-facebook-identifiers/).
The archive works as expected at first, but stop working after somewhere between one and weeks.
To reproduce, make an archive of a facebook page while logged-in. Then try to replay the archive again after 10 or so days.
Starting the timer on:
https://inkdroid.org/web-archives/archive/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Frandom.wacz#view=pages&urlSearchType=prefix&url=https%3A%2F%2Fwww.facebook.com%2Fgeorgeclintonpfunk&ts=20221018150735
@halmos do you have an example that's currently breaking at the moment? I'm curious if this is indeed the issue or something else, as the archive should not have any interaction with actual identifiers, and we're also updating the Date on the replay to match the time of archive creation..
Starting the timer on: https://inkdroid.org/web-archives/archive/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Frandom.wacz#view=pages&urlSearchType=prefix&url=https%3A%2F%2Fwww.facebook.com%2Fgeorgeclintonpfunk&ts=20221018150735
I'm not sure if this test will work. I think the problem will only occur on FB archives where the user was logged-in at the time of archiving. I also don't see an indication that the post is using the new pfbid system. I think facebook is rolling that new system out in stages, so not all posts currently use it.
@halmos do you have an example that's currently breaking at the moment? I'm curious if this is indeed the issue or something else, as the archive should not have any interaction with actual identifiers, and we're also updating the Date on the replay to match the time of archive creation.
I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely.
I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely.
Yes, don't post it here, but you can upload it somewhere and send us a link to dev [at] webrecorder.net I'm hoping that it's something that could be fixed with rewriting improvements..
Do the rotating IDs only work for logged in FB users?
Do the rotating IDs only work for logged in FB users?
That's a good question. I believe they are also used for public / signed out posts, but I think the IDs on signed-in pages are used with authenticated API requests which may have additional side-effects. Unfortunately, this is a hard thing to test, but so far my experience is that the wacz files seem to be effected only on logged-in sessions. More testing is needed, however.
@halmos can you try archiving and replaying with latest versions? We may have fixed some issues related to this. I'm not sure the rotating ID is involved.
I am seeing fewer problems with the latest version of the extension. However I do see at least one example where images are not loading. Facebook seems to be using some pretty obscure code to load images dynamically. For example, here is the markup for an image which is not loading in the archive, tho i can see that the image does exist in the web archive:
<a href="/replay/w/n1h09qy637kujzpdcndwzs/20221018145215mp_/https://m.facebook.com/aalisarem/photos/pcb.167216991371363/167216948038034/?type=3&av=1498090096&eav=AfaXUKGW9nRapv2ODNFrM9HwoFikbQp2ymZmtVBrqUKedbbDenWGVsYjKDos9_vuUYc&source=48&__tn__=EH-R&paipv=0" class="_39pi _26ih" style="top:162px; left:162px; width: 158px; height: 158px;">
<div class="_50xr _403j" style="width:158px;height:158px;">
<i class="img _5sgi img _2sxw" style="top:-26px;background-image: url('https\3a //scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp\3d cp0_dst-jpg_e15_p320x320_q65\26 _nc_cat\3d 101\26 ccb\3d 1-7\26 _nc_sid\3d 110474\26 efg\3d eyJpIjoidCJ9\26 _nc_ohc\3d 2DPAfgJ8OVcAX--T7xA\26 tn\3d 2K8adAyEtjKShIqL\26 _nc_ht\3d scontent-lga3-2.xx\26 oh\3d 00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg\26 oe\3d 63732404');background-repeat:no-repeat;background-size:100% 100%;-webkit-background-size:100% 100%;width:158px;height:211px;" aria-label="No photo description available." role="img"></i>
</div>
</a>
and here is how the image url is listed in the archive: https://scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp=cp0_dst-jpg_e15_p320x320_q65&_nc_cat=101&ccb=1-7&_nc_sid=110474&efg=eyJpIjoidCJ9&_nc_ohc=2DPAfgJ8OVcAX--T7xA&tn=2K8adAyEtjKShIqL&_nc_ht=scontent-lga3-2.xx&oh=00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg&oe=63732404
I'm not sure that the rotating id is relevant anymore - 0.12.6+ includes various fixes for facebook capture and replay. Closing this for now. Please try with the latest version to see if this issue is no longer an issue.