scrapemark
scrapemark copied to clipboard
Pull Request
Some fixes and stuff.
This looks like a win. @arshaw: Merge?
I don't know which update is the cause, but this code does not work with patterns involving lists of dictionaries. For example, the following pattern
{* <a href='{{ [links].url }}'>{{ [links].title }}</a> *}
yields this result when run on http://www.google.com/ :
{'links': [{'title': u'SearchImagesMapsPlayYouTubeNewsGmailDriveMore \u25bcCalendarTranslateMobileBooksOffersWalletShoppingBloggerReaderFinancePhotosVideosEven more \xbbWeb HistorySign inSearch settingsInstall Google ChromeAdvanced searchLanguage toolsAdvertising\xa0ProgramsBusiness Solutions+GoogleAbout GooglePrivacy & Terms',
'url': u'http://www.google.com/webhp?hl=en&tab=wwhttp://www.google.com/imghp?hl=en&tab=wihttp://maps.google.com/maps?hl=en&tab=wlhttps://play.google.com/?hl=en&tab=w8http://www.youtube.com/?tab=w1http://news.google.com/nwshp?hl=en&tab=wnhttps://mail.google.com/mail/?tab=wmhttps://drive.google.com/?tab=wohttp://www.google.com/intl/en/options/https://www.google.com/calendar?tab=wchttp://translate.google.com/?hl=en&tab=wThttp://www.google.com/mobile/?tab=wDhttp://books.google.com/bkshp?hl=en&tab=wphttps://www.google.com/offers/home?utm_source=xsell&utm_medium=products&utm_campaign=sandbar&tab=wG#!detailshttps://wallet.google.com/manage/?tab=wahttp://www.google.com/shopping?hl=en&tab=wfhttp://www.blogger.com/?tab=wjhttp://www.google.com/reader/?hl=en&tab=wyhttp://www.google.com/finance?tab=wehttp://picasaweb.google.com/home?hl=en&tab=wqhttp://video.google.com/?hl=en&tab=wvhttp://www.google.com/intl/en/options/http://www.google.com/history/optout?hl=enhttps://accounts.google.com/ServiceLogin?hl=en&continue=http://www.google.com//preferences?hl=en/chrome/index.html?hl=en&brand=CHNG&utm_source=en-hpp&utm_medium=hpp&utm_campaign=en/advanced_search?hl=en&authuser=0/language_tools?hl=en&authuser=0/intl/en/ads//services/intl/en/about.html/intl/en/policies/'}]}
With the pattern where the dctionaries only have one key each
{* <a href='{{ [links].url }}'></a> *}
it returns the expected result:
{'links': [{'url': u'http://www.google.com/webhp?hl=en&tab=ww'},
{'url': u'http://www.google.com/imghp?hl=en&tab=wi'},
{'url': u'http://maps.google.com/maps?hl=en&tab=wl'},
{'url': u'https://play.google.com/?hl=en&tab=w8'},
{'url': u'http://www.youtube.com/?tab=w1'},
{'url': u'http://news.google.com/nwshp?hl=en&tab=wn'},
{'url': u'https://mail.google.com/mail/?tab=wm'},
{'url': u'https://drive.google.com/?tab=wo'},
{'url': u'http://www.google.com/intl/en/options/'},
{'url': u'https://www.google.com/calendar?tab=wc'},
{'url': u'http://translate.google.com/?hl=en&tab=wT'},
{'url': u'http://www.google.com/mobile/?tab=wD'},
{'url': u'http://books.google.com/bkshp?hl=en&tab=wp'},
{'url': u'https://www.google.com/offers/home?utm_source=xsell&utm_medium=products&utm_campaign=sandbar&tab=wG#!details'},
{'url': u'https://wallet.google.com/manage/?tab=wa'},
{'url': u'http://www.google.com/shopping?hl=en&tab=wf'},
{'url': u'http://www.blogger.com/?tab=wj'},
{'url': u'http://www.google.com/reader/?hl=en&tab=wy'},
{'url': u'http://www.google.com/finance?tab=we'},
{'url': u'http://picasaweb.google.com/home?hl=en&tab=wq'},
{'url': u'http://video.google.com/?hl=en&tab=wv'},
{'url': u'http://www.google.com/intl/en/options/'},
{'url': u'http://www.google.com/history/optout?hl=en'},
{'url': u'https://accounts.google.com/ServiceLogin?hl=en&continue=http://www.google.com/'},
{'url': u'/preferences?hl=en'},
{'url': u'/chrome/index.html?hl=en&brand=CHNG&utm_source=en-hpp&utm_medium=hpp&utm_campaign=en'},
{'url': u'/advanced_search?hl=en&authuser=0'},
{'url': u'/language_tools?hl=en&authuser=0'},
{'url': u'/intl/en/ads/'},
{'url': u'/services/'},
{'url': u'/intl/en/about.html'},
{'url': u'/intl/en/policies/'}]}
I know this thread is ancient, and I'm sorry for being very uninvolved with this project ever since its launch more than 3 years ago, but just wanted to announce that Scrapemark is no longer being maintained. I wrote a blog post reflecting on it: http://blog.arshaw.com/1/post/2013/03/reflecting-on-scrapemark.html
@quink, looking back, this should have been test-driven development, you got that right. Looks like you got to know the code pretty well. thanks for all these changes and sorry I never merged them.