words_counted icon indicating copy to clipboard operation
words_counted copied to clipboard

Add critbit support

Open rbotafogo opened this issue 8 years ago • 4 comments

May I suggest that you also use the critbit gem and add words to critbit? In this way you can also do search by prefix.

rbotafogo avatar Sep 17 '15 20:09 rbotafogo

Hi there. Thanks for the suggestion. But I'm not sure how this would be useful this would. It also seems to be a heavy dependency to introduce the gem. I'm a bit reluctant to have any JRuby dependencies here too. What do you think?

abitdodgy avatar Oct 24 '15 21:10 abitdodgy

Hi Mohamad,

Using Critbit would allow the tokenizer to also show all the tokens in sorted order and do searches by token prefix. It would probably be also more space efficient, as Critbit only stores a prefix once as the image below:

[image: Imagem inline 1]

I agree that you should not bring any JRuby dependencies to your Gem. My idea is for you to allow the user to give you the storage class. By default, your code should use a Hash, however, if the user is on JRuby and she has Critbit installed, then when instantiating a WordsCounted, she could pass the Critbit class. Your code has only to consider that instead of working with Hash, it could work with any class that has the same Hash interface.

What do you think?

Cheers,

2015-10-24 19:37 GMT-02:00 Mohamad El-Husseini [email protected]:

Hi there. Thanks for the suggestion. But I'm not sure how this would be useful this would. It also seems to be a heavy dependency to introduce the gem. I'm a bit reluctant to have any JRuby dependencies here too. What do you think?

— Reply to this email directly or view it on GitHub https://github.com/abitdodgy/words_counted/issues/20#issuecomment-150853009 .

Rodrigo Botafogo Integrando TI ao seu negócio 21-3010-4802/11-3010-1802

rbotafogo avatar Oct 26 '15 13:10 rbotafogo

Your code has only to consider that instead of working with Hash, it could work with any class that has the same Hash interface.

Thanks. I'm having a hard time visualizing what you mean, but that's because I'm not familiar with CritBit. Do you mind sharing a small example of what the interface would look like? Some example, pseudo-code?

I've made a few changes to the gem in the last couple of weeks. Are you aware of those changes? I've decoupled the Tokeniser and the Counter classes.

abitdodgy avatar Oct 27 '15 23:10 abitdodgy

Hi Mohamad,

The Hash API should be a subset of the Critbit API. In principle, wherever you use a Hash you could use a Critbit. The main difference is that Critbit will keep the data in sorted order while Hash if you do hash.each will cycle through the data in the order it was inserted. Bellow I show the use of Critbit as a Hash:

crit = Critbit.new

add some key, value pairs to crit

crit["hello"] = 0 crit["there"] = 1 crit["essa"] = 10 crit["Essa é uma frase para armazenar"] = 100 assert_equal(4, crit.size) assert_equal(0, crit["hello"]) assert_equal(1, crit["there"]) assert_equal(100, crit["Essa é uma frase para armazenar"])

fetch the key from crit

assert_equal(0, crit.fetch("hello"))

remove a key, value pair from crit. Given the key it will return

the value and

remove the entry

assert_equal(0, crit.delete("hello")) assert_equal(3, crit.size) assert_equal(nil, crit["hello"]) crit.delete("not there") { |k| p "#{k} is not there" }

assert_raise ( KeyError ) { crit.fetch("hello") } assert_equal("NotFound", crit.fetch("hello", "NotFound"))

crit also accepts complex objects

crit["works?"] = [10, 20, 30] assert_equal([10, 20, 30], crit["works?"]) assert_equal(["works?", [10, 20, 30]], crit.assoc("works?"))

check if keys are stored in crit

assert_equal(true, crit.has_key?("there")) assert_equal(false, crit.has_key?("Not there"))

crit stores data in sorted order, so we can call min and max on crit

assert_equal(["Essa é uma frase para armazenar", 100], crit.min) assert_equal(["works?", [10, 20, 30]], crit.max)

crit also allows for checking value containment

assert_equal(true, crit.has_value?(100)) assert_equal(true, crit.has_value?([10, 20, 30])) assert_equal(false, crit.has_value?("hello"))

method entries returns all entries in the Critbit... same as Hash

assert_equal([["Essa \u00E9 uma frase para armazenar", 100], ["essa", 10], ["there", 1], ["works?", [10, 20, 30]]], crit.entries)

it is possible to change a value for a given key

crit["essa"] = 20 assert_equal([["Essa \u00E9 uma frase para armazenar", 100], ["essa", 20], ["there", 1], ["works?", [10, 20, 30]]], crit.entries)

Critbit also allows to get data by prefix. Lets add some data to a Critbit:

crit = Critbit.new

crit is space efficient and stores prefixes only once and can be used to

find only strings that match a certain prefix

items = ["u", "un", "unh", "uni", "unj", "unim", "unin", "unio", "uninc", "unind", "unine", "unindd", "uninde", "unindf", "unindew", "unindex", "unindey", "a", "z"]

add items to the container

items.each do |item| crit[item] = item end

Let´s now retrieve only data that has prefix ‘unin’

crit.prefix = "unin"

Does each for all elements in the critbit with prefix 'unin'

print "[" crit.each do |key, value| print "[#{key}, #{value}] " end print "]"

This is the result:

[[unin, unin] [uninc, uninc] [unind, unind] [unindd, unindd] [uninde, uninde] [unindew, unindew] [unindex, unindex] [unindey, unindey] [unindf, unindf] [unine, unine] ]

Does that help? ​

2015-10-27 21:35 GMT-02:00 Mohamad El-Husseini [email protected]:

Your code has only to consider that instead of working with Hash, it could work with any class that has the same Hash interface.

Thanks. I'm having a hard time visualizing what you mean, but that's because I'm not familiar with CritBit. Do you mind sharing a small example of what the interface would look like?

I've made a few changes to the gem in the last couple of weeks. Are you aware of those changes? I've decoupled the Tokeniser and the Counter classes.

— Reply to this email directly or view it on GitHub https://github.com/abitdodgy/words_counted/issues/20#issuecomment-151675382 .

Rodrigo Botafogo Integrando TI ao seu negócio 21-3010-4802/11-3010-1802

rbotafogo avatar Oct 28 '15 14:10 rbotafogo