waybackpack icon indicating copy to clipboard operation
waybackpack copied to clipboard

Waybackpack + matchType

Open uy5cu71 opened this issue 2 years ago • 1 comments

Wayback API had matchType option, example: https://web.archive.org/cdx/search/cdx?url=https://twitter.com/jack/statuses&matchType=prefix

Which returns:

com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20121223123338 https://twicom,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20121223123338 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 5296
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130203195805 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1042
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130312144230 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1035
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130326132131 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 9317
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130402123359 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 1030tter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 5296
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130203195805 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1042
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130312144230 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1035
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130326132131 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 9317
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130402123359 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 1030

Is it possible to download all of this urls? Because waybackpack will trim url based on cli input.

I have try to add new matchType parametr to the cdx file, i get valid response, but waybackpack still trim url based on cli input

uy5cu71 avatar Sep 01 '22 19:09 uy5cu71

Hi @uy5cu71, and thanks for your interest in this library. Unfortunately, I'm not sure I 100% understand your inquiry. But if it helps: waybackpack does not currently support the matchType parameter.

jsvine avatar Sep 06 '22 23:09 jsvine