R Max Espinoza
R Max Espinoza
> Your resources/recipes seem more geared towards volume than specificity... My big fail on documenting the design. My goal was a tool to find and download datasets with a very...
Example: https://data.mendeley.com/datasets/c693yzczts/1
There is pokeapi too https://pokeapi.co/
Pokeapi uses data from veekun: https://github.com/veekun/pokedex/tree/master/pokedex/data/csv
This is a huge one, though. At this point I wonder if it may be better to use a micro-service for the search rather than local indexing and search capabilities....
We could follow `brew tap` approach and move these big datasets aggregators to its own recipes repository, and the users may choose to use it. A drawback is that the...
Datasets descriptions html files can be linked via github pages, i.e.: http://vincentarelbundock.github.io/Rdatasets/doc/plm/EmplUK.html
> it's hacky, debugging it is kind of hard I totally agree with that. :)
I use `CONCURRENT_ITEMS = 1` in this cases. I haven't verified how much improve the memory usage, though.
Looks like you are missing this setting: ``` # Ensure all spiders share same duplicates filter through redis. DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter" ```