archiveweb.page icon indicating copy to clipboard operation
archiveweb.page copied to clipboard

Facebook's new Rotating IDs break replayWeb.page

Open halmos opened this issue 3 years ago • 9 comments

ReplayWeb.page cannot replay facebook posts that use their new Rotating ids scheme (https://about.fb.com/news/2022/09/deterring-scraping-by-protecting-facebook-identifiers/).

The archive works as expected at first, but stop working after somewhere between one and weeks.

To reproduce, make an archive of a facebook page while logged-in. Then try to replay the archive again after 10 or so days.

halmos avatar Oct 18 '22 14:10 halmos

Starting the timer on:

https://inkdroid.org/web-archives/archive/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Frandom.wacz#view=pages&urlSearchType=prefix&url=https%3A%2F%2Fwww.facebook.com%2Fgeorgeclintonpfunk&ts=20221018150735

edsu avatar Oct 18 '22 15:10 edsu

@halmos do you have an example that's currently breaking at the moment? I'm curious if this is indeed the issue or something else, as the archive should not have any interaction with actual identifiers, and we're also updating the Date on the replay to match the time of archive creation..

ikreymer avatar Oct 18 '22 23:10 ikreymer

Starting the timer on: https://inkdroid.org/web-archives/archive/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Frandom.wacz#view=pages&urlSearchType=prefix&url=https%3A%2F%2Fwww.facebook.com%2Fgeorgeclintonpfunk&ts=20221018150735

I'm not sure if this test will work. I think the problem will only occur on FB archives where the user was logged-in at the time of archiving. I also don't see an indication that the post is using the new pfbid system. I think facebook is rolling that new system out in stages, so not all posts currently use it.

halmos avatar Oct 24 '22 16:10 halmos

@halmos do you have an example that's currently breaking at the moment? I'm curious if this is indeed the issue or something else, as the archive should not have any interaction with actual identifiers, and we're also updating the Date on the replay to match the time of archive creation.

I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely.

halmos avatar Oct 24 '22 16:10 halmos

I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely.

Yes, don't post it here, but you can upload it somewhere and send us a link to dev [at] webrecorder.net I'm hoping that it's something that could be fixed with rewriting improvements..

ikreymer avatar Oct 24 '22 16:10 ikreymer

Do the rotating IDs only work for logged in FB users?

edsu avatar Oct 24 '22 17:10 edsu

Do the rotating IDs only work for logged in FB users?

That's a good question. I believe they are also used for public / signed out posts, but I think the IDs on signed-in pages are used with authenticated API requests which may have additional side-effects. Unfortunately, this is a hard thing to test, but so far my experience is that the wacz files seem to be effected only on logged-in sessions. More testing is needed, however.

halmos avatar Oct 24 '22 21:10 halmos

@halmos can you try archiving and replaying with latest versions? We may have fixed some issues related to this. I'm not sure the rotating ID is involved.

ikreymer avatar Feb 25 '23 00:02 ikreymer

I am seeing fewer problems with the latest version of the extension. However I do see at least one example where images are not loading. Facebook seems to be using some pretty obscure code to load images dynamically. For example, here is the markup for an image which is not loading in the archive, tho i can see that the image does exist in the web archive:

<a href="/replay/w/n1h09qy637kujzpdcndwzs/20221018145215mp_/https://m.facebook.com/aalisarem/photos/pcb.167216991371363/167216948038034/?type=3&amp;av=1498090096&amp;eav=AfaXUKGW9nRapv2ODNFrM9HwoFikbQp2ymZmtVBrqUKedbbDenWGVsYjKDos9_vuUYc&amp;source=48&amp;__tn__=EH-R&amp;paipv=0" class="_39pi _26ih" style="top:162px; left:162px; width: 158px; height: 158px;">
  <div class="_50xr _403j" style="width:158px;height:158px;">
    <i class="img _5sgi img _2sxw" style="top:-26px;background-image: url('https\3a //scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp\3d cp0_dst-jpg_e15_p320x320_q65\26 _nc_cat\3d 101\26 ccb\3d 1-7\26 _nc_sid\3d 110474\26 efg\3d eyJpIjoidCJ9\26 _nc_ohc\3d 2DPAfgJ8OVcAX--T7xA\26 tn\3d 2K8adAyEtjKShIqL\26 _nc_ht\3d scontent-lga3-2.xx\26 oh\3d 00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg\26 oe\3d 63732404');background-repeat:no-repeat;background-size:100% 100%;-webkit-background-size:100% 100%;width:158px;height:211px;" aria-label="No photo description available." role="img"></i>
  </div>
</a>

and here is how the image url is listed in the archive: https://scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp=cp0_dst-jpg_e15_p320x320_q65&_nc_cat=101&ccb=1-7&_nc_sid=110474&efg=eyJpIjoidCJ9&_nc_ohc=2DPAfgJ8OVcAX--T7xA&tn=2K8adAyEtjKShIqL&_nc_ht=scontent-lga3-2.xx&oh=00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg&oe=63732404

halmos avatar Mar 28 '23 16:03 halmos

I'm not sure that the rotating id is relevant anymore - 0.12.6+ includes various fixes for facebook capture and replay. Closing this for now. Please try with the latest version to see if this issue is no longer an issue.

ikreymer avatar Aug 14 '24 01:08 ikreymer