maia issues

Results 28 issues of


                                            maia

setting a key that conflicts with a built-in method Feedlr::Base#length defined in Hash

Using master I'm experiencing a number of warnings when paging through the result of a `stream_entries_contents` query (and actually it seems to happen inside `each`, as the warnings are spawned...

Ruby warning: forwarding to private method

Ruby 2.5.0 warns about a forward to a private method in ` Feedlr::Client#connection` when calling the method `stream_entries_contents`, see here: /…/feedlr/request.rb:57: warning: Feedlr::Request#connection at /…/ruby/2.5.0/forwardable.rb:157 forwarding to private method Feedlr::Client#connection...

PageRank: NoMethodError: undefined method `-' for nil:NilClass

There's a problem with PageRank calculation: whenever you add a source that's not the destination of any other source, you end up with: ``` lib/graph-rank/page_rank.rb:67:in `block in convergence' lib/graph-rank/page_rank.rb:66:in `each'...

multiple slashes within a string not properly processed

A string containing multiple slashes will only be split by its last ocurrence of a slash: ``` string = "Washington/London/Paris/Tokyo" PragmaticTokenizer::Tokenizer.new().tokenize(string) => ["washington/london/paris", "tokyo"] ``` This is caused by the...

urls should not be downcased

While it's rare that a URL uses uppercase letters, some do. And as urls are case-sensitive, they should not be transformed when using the option `downcase: true`, so the following...

bug

help wanted

feature overlap with pragmatic_segmenter?

Currently there is some overlap between pragmatic_tokenizer and pragmatic_segmenter, as both e.g. handle abbreviations. Should rules and constants (especially when language specific) that are shared between both gems be extracted...

question

mapping of similar characters (e.g. apostrophes)?

As I've encountered many variants of apostrophes, I wonder if pragmatic_tokenizer should normalize these (optionally) by mapping them all to a single character to ensure that they are treated as...

enhancement

option to require only specific languages?

Suggestion: as the language files will grow (abbreviations, contractions,…) and will use more memory, it would be nice to let users require only specific languages, so that when e.g. someone...

enhancement

return String instead of PragmaticSegmenter::Text

Currently pragmatic_segmenter returns an instance of `PragmaticSegmenter::Text`, which is a subclass of `String`. As pragmatic_tokenizer checks if `text.class == String` and also returning segmented objects of a different class than...

reduce memory usage by reusing segmenter

I just realised that for an `array` with 1000 strings with each 50-300 chars length (url titles and description generated by [gottfrois/link_thumbnailer](https://github.com/gottfrois/link_thumbnailer)), the following causes a much higher memory load…...