baleen icon indicating copy to clipboard operation
baleen copied to clipboard

An automated ingestion service for blogs to construct a corpus for NLP research.

Results 22 baleen issues
Sort by recently updated
recently updated
newest added

Taking some lessons from Steven Lott's PyData presentation: http://pydata.org/dc2016/schedule/presentation/40/ https://twitter.com/s_lott https://slott56.github.io/no-sql-doesnt-mean-no-schema/assets/player/KeynoteDHTMLPlayer.html#0 We can formalize the Mongo schemas using JSON and relying on JSON validation to ensure that we never even...

Timeout decorator introduced with https://github.com/bbengfort/baleen/commit/2e5d83767cfa3ceebfdada0680f713e73e10fbae Acceptance criteria: - Use decorator for methods with potentially very long running operation - Properly handle BaleenTimeout Errors at call sites for these methods

type: technical debt
type: feature
priority: medium

The pymongo driver is very strict and if it can't decode a mongo document it raises an exception. This is turning up in export where apparently (after 12 minutes or...

type: bug
priority: high

This method was originally written to wrap html snippets to look like a real web page. Now we have the ability to fetch complete web pages from RSS feeds. However...

Add a timeout so that if a post or feed is having trouble being downloaded, we skip it and carry on.

type: feature
ready
priority: high

Update Quickstart documentation as we discover gaps at PyCon sprints.

type: technical debt
priority: medium
in progress

Baleen crashes when Mongo refuses a connection; not sure why that's happening though.

type: bug
ready
priority: high

The method: `baleen.models.Feed.count_posts` Is too slow on the deployment server. It seems that: `Post.objects(feed=self).count()` is going through the entire collection and filtering, which is bad. Need to figure out a...

type: bug
priority: medium
ready

The status screen in currently running got a bit wonky by accident: ![screenshot 2016-04-19 12 52 24](https://cloud.githubusercontent.com/assets/745966/14647900/b83afad8-062d-11e6-9982-9c5c31ba195e.png) I think this was just caused by us writing updates at the same...

type: bug
priority: low