fullflu comments

Results 14 comments of


                                            fullflu

Fixed a bug when processing data without socre columns.

Hi @miyamamoto thank you for submitting this pull request. I understand a bug when processing data without score columns, but I guess there are still some bugs of the scripts....

Multi-hot encoding for ambiguous input

Thank you for your response. > It would be nice if the code included a reference to some article or a blog post that would illustrate on a trivial example...

WIP: Multi_hot encoder for ambiguous inputs

Thank you for your reviews. 1. The transformation test was based on `test_one_hot.py`. I inserted your suggestion into my test code. 2. Suffixes start with `1`, not `0` in the...

WIP: Multi_hot encoder for ambiguous inputs

> That's actually a mistake of mine in test_one_hot.py. I will fix it. LGTM. > I see. I was concerned about strings like "extra_10", which would not get captured. But...

WIP: Multi_hot encoder for ambiguous inputs

> I just attempt to keep the test results free of errors and warnings - once I allow one warning, additional warnings tend to lure in. I got it. I...

WIP: Multi_hot encoder for ambiguous inputs

> Just write somewhere that | assumes uniform distribution of the feature values. For example, when the data contain 1|2, the encoder assumes that there is 0.5 probability that the...

WIP: Multi_hot encoder for ambiguous inputs

Thank you for your nice suggestion. ## This is the next plan: * Coming soon (within a few weeks at the latest) - rename multiple_split_string to or_delimiter - rename and...

WIP: Multi_hot encoder for ambiguous inputs

> rename and fix prior-related options as your suggestion Although your suggestion is so cool, I found that your prior-related options would discard the flexibility. That is why I want...

WIP: Multi_hot encoder for ambiguous inputs

> How is it going to work? Is it similar to TfidfVectorizer or CountVectorizer? I have imagined simpler than them. For each column, all rows where `and_delimiter` is included are...

Similarity search based on dot product

Thank you for your response. I understand your plan. I'll use other methods for a while, but I’m looking forward to your implementation of the dot product similarity (if possible).