R Max Espinoza

Results 56 comments of R Max Espinoza

> Your resources/recipes seem more geared towards volume than specificity... My big fail on documenting the design. My goal was a tool to find and download datasets with a very...

Example: https://data.mendeley.com/datasets/c693yzczts/1

There is pokeapi too https://pokeapi.co/

Pokeapi uses data from veekun: https://github.com/veekun/pokedex/tree/master/pokedex/data/csv

This is a huge one, though. At this point I wonder if it may be better to use a micro-service for the search rather than local indexing and search capabilities....

We could follow `brew tap` approach and move these big datasets aggregators to its own recipes repository, and the users may choose to use it. A drawback is that the...

Datasets descriptions html files can be linked via github pages, i.e.: http://vincentarelbundock.github.io/Rdatasets/doc/plm/EmplUK.html

> it's hacky, debugging it is kind of hard I totally agree with that. :)

I use `CONCURRENT_ITEMS = 1` in this cases. I haven't verified how much improve the memory usage, though.

Looks like you are missing this setting: ``` # Ensure all spiders share same duplicates filter through redis. DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter" ```