practical-machine-learning-with-python issues

Very minor bug in contractions with "ain't"

[]

The word ain't produces "as not" because expand_contractions has the following code: `expanded_contraction = first_char + expanded_contraction[1:]` Ain't does not fit this general rule

bbookman

Ch07 use contractions_dict instead of import CONTRACTION_MAP

[]

Hi, In the chapter 7 :: https://github.com/dipanjanS/practical-machine-learning-with-python/blob/master/notebooks/Ch07_Analyzing_Movie_Reviews_Sentiment/Text%20Normalization%20Demo.ipynb , please use contractions_dict instead of "import CONTRACTION_MAP". Also, please correct the spaCy load.

sony-git

Requirements file

[]

I think this would benefit from a requirements file with pinned versions. I'm getting stuck as usual on packages having conflicting/ incorrect versions in my conda environment.

vishnya

Index Error

[{"_id":"6327661c0d68a95f041c0d3f","body":"Hi @gopinathankm \r\nCould you mention the particular notebook and chapter you are referring to. Also please mention your pandas and numpy versions","issue_id":1660133728967,"origin_id":531551862,"user_origin_id":3089481,"create_time":1568541475,"update_time":1568541475,"id":1663526428368,"updated_at":"2022-09-18T18:40:28.368000Z","created_at":"2022-09-18T18:40:28.368000Z"},{"_id":"6327661c0d68a95f041c0d40","body":"No problem I managed the issue. Thanks","issue_id":1660133728967,"origin_id":531559636,"user_origin_id":6606671,"create_time":1568548768,"update_time":1568548768,"id":1663526428372,"updated_at":"2022-09-18T18:40:28.371000Z","created_at":"2022-09-18T18:40:28.371000Z"},{"_id":"6327661c0d68a95f041c0d41","body":"@gopinathankm sure. If there's a fix you could share, please do so for everyone else's benefit.","issue_id":1660133728967,"origin_id":533426018,"user_origin_id":3089481,"create_time":1568960977,"update_time":1568960977,"id":1663526428375,"updated_at":"2022-09-18T18:40:28.375000Z","created_at":"2022-09-18T18:40:28.375000Z"}] comment

When execute following statement from .ipynb file `**neg_idx = df[(df.news_category=='technology') & (df.sentiment_score == -15)].index[0]** I get following IndexError Traceback (most recent call last) in ----> 5 neg_idx = df[(df.news_category=='technology') &...

gopinathankm

Easier way to download "en_vectors_web_lg" model in spacy

[{"_id":"632766b1a49e0e1da50b196c","body":"The reason for that is two-fold\n\n1. Spacy's CLI wasn't matured yet at the time of the book's release\n2. Sometimes for proxy and other internal environments the CLI download\nalso might fail sometime.\n\nHowever in regular environments, it's definitely a better approach to\nfollow as long is it works.\n\nOn Tue, Sep 10, 2019 at 11:33 AM Anamitra Musib <[email protected]>\nwrote:\n\n> The procedure for downloading the *\"en_vectors_web_lg\"* in spacy. by\n> downloading and unzipping the file, and shifting it to the appropriate\n> directory, as illustrated here\n> <https:\/\/github.com\/dipanjanS\/practical-machine-learning-with-python\/blob\/master\/bonus%20content\/feature%20engineering%20text%20data\/Feature%20Engineering%20Text%20Data%20-%20Advanced%20Deep%20Learning%20Strategies.ipynb>\n> is long and cumbersome.\n>\n> Instead of the above procedure, we could simply do the following to load\n> the model:\n>\n> import spacy\n> import spacy.cli\n> spacy.cli.download(\"en_vectors_web_lg\")\n> nlp = spacy.load('en_vectors_web_lg')\n>\n> \u2014\n> You are receiving this because you are subscribed to this thread.\n> Reply to this email directly, view it on GitHub\n> <https:\/\/github.com\/dipanjanS\/practical-machine-learning-with-python\/issues\/21?email_source=notifications&email_token=AA2J3RZFLIFW4Z3UGMQMXTLQI42DBA5CNFSM4IVDKQQ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKLGUUQ>,\n> or mute the thread\n> <https:\/\/github.com\/notifications\/unsubscribe-auth\/AA2J3R4TCZNZGTX4QWMCQJTQI42DBANCNFSM4IVDKQQQ>\n> .\n>\n","issue_id":1660133728969,"origin_id":529822124,"user_origin_id":3448263,"create_time":1568102694,"update_time":1568102694,"id":1663526577721,"updated_at":"2022-09-18T18:42:57.721000Z","created_at":"2022-09-18T18:42:57.721000Z"}] comment

The procedure for downloading the **"en_vectors_web_lg"** in spacy. by downloading and unzipping the file, and shifting it to the appropriate directory, as illustrated [here](https://github.com/dipanjanS/practical-machine-learning-with-python/blob/master/bonus%20content/feature%20engineering%20text%20data/Feature%20Engineering%20Text%20Data%20-%20Advanced%20Deep%20Learning%20Strategies.ipynb) is long and cumbersome. Instead of...

Anacoder1

expand contractions is not working as expected

[{"_id":"632766ead7e37411a830cf8f","body":"Yes sadly this is a known issue even with the contractions package on PyPI. The whole regex pattern framework used for this will need to be modified I am guessing. Will be taking a look at this some time in the future as I get some time.","issue_id":1660133728971,"origin_id":489461613,"user_origin_id":3448263,"create_time":1557088879,"update_time":1557088886,"id":1663526634745,"updated_at":"2022-09-18T18:43:54.745000Z","created_at":"2022-09-18T18:43:54.745000Z"}] comment

When there are more than one single quote(') such as "you'll've" , Expand contractions in text_normalizer.py is giving output as "you willve" but the expected output is "you will have"

jonanem

word.lemma_ will lowercase the token

[{"_id":"6327688bd7e37411a830cfda","body":"https:\/\/github.com\/dipanjanS\/practical-machine-learning-with-python\/blob\/master\/bonus%20content\/nlp%20proven%20approach\/NLP%20Strategy%20I%20-%20Processing%20and%20Understanding%20Text.ipynb","issue_id":1660133728973,"origin_id":456266423,"user_origin_id":5894780,"create_time":1548130830,"update_time":1548130830,"id":1663527051888,"updated_at":"2022-09-18T18:50:51.888000Z","created_at":"2022-09-18T18:50:51.888000Z"},{"_id":"6327688bd7e37411a830cfdb","body":"That would be more of a spacy issue I think, as far as I know typically any lemmatizer returns text in lower case. You might consider checking `nltk` but I am pretty sure they do the same.\r\n\r\nOnly option for you is to check the case of each word beforehand and then manually convert the case of the lemma after lemmatization if needed. Otherwise would need to send this request upstream to the spacy or nltk devs.","issue_id":1660133728973,"origin_id":456273867,"user_origin_id":3448263,"create_time":1548134133,"update_time":1548134133,"id":1663527051892,"updated_at":"2022-09-18T18:50:51.892000Z","created_at":"2022-09-18T18:50:51.892000Z"},{"_id":"6327688bd7e37411a830cfdc","body":"Do we need to add some comments to explain this in your [practical-machine-learning-with-python](https:\/\/github.com\/dipanjanS\/practical-machine-learning-with-python\/blob\/master\/bonus%20content\/nlp%20proven%20approach\/NLP%20Strategy%20I%20-%20Processing%20and%20Understanding%20Text.ipynb)?","issue_id":1660133728973,"origin_id":456282009,"user_origin_id":5894780,"create_time":1548137268,"update_time":1548137268,"id":1663527051895,"updated_at":"2022-09-18T18:50:51.894000Z","created_at":"2022-09-18T18:50:51.894000Z"},{"_id":"6327688bd7e37411a830cfdd","body":"People may be confused to see their results because no matter how they set the `do_lowercase`, they always get the lowercased text.","issue_id":1660133728973,"origin_id":456282429,"user_origin_id":5894780,"create_time":1548137421,"update_time":1548137421,"id":1663527051897,"updated_at":"2022-09-18T18:50:51.896000Z","created_at":"2022-09-18T18:50:51.896000Z"},{"_id":"6327688bd7e37411a830cfde","body":"yes definitely, that would be a good point to mention I think","issue_id":1660133728973,"origin_id":456575633,"user_origin_id":3448263,"create_time":1548193556,"update_time":1548193556,"id":1663527051899,"updated_at":"2022-09-18T18:50:51.899000Z","created_at":"2022-09-18T18:50:51.899000Z"}] comment

`text = ' '.join([word.lemma_ if word.lemma_ != '-PRON-' else word.text for word in text])` In the latest Spacy, after I run the code, the text will be lowercased. Actually, I...

HuaizhengZhang

enhancement

datasets

[]

is there a link for the datasets used in this book

njahnemwaura

Bug-Fix Issue-#29

[{"_id":"66378f99b7a25c1c61158464","body":"I'm a little late to the party, but I noticed you all aren't using a notebook review tool and wanted to invite you to review this pull request with GitNotebooks: https:\/\/gitnotebooks.com\/dipanjanS\/practical-machine-learning-with-python\/pull\/37\n\nIt lets you do things like comment on rendered markdown and code cells, so might be an easy win for your PR reviews.","issue_id":1708306674299,"origin_id":1920120094,"user_origin_id":5474861,"create_time":1706741336,"update_time":1706741336,"id":1714917273405,"updated_at":"2024-05-05T13:54:33.404000Z","created_at":"2024-05-05T13:54:33.404000Z"}] comment

#29 Issue **Errors:** 1. AttributeError: 'Series' object has no attribute 'ix' 2. TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n...

Gopi-Vamsi-Penaganti

Chapter 4 feature engineering numeric data notebook broken on GitHub

[]

As of 2023-12-26, the notebook notebooks/Ch04_Feature_Engineering_and_Selection/Feature Engineering on Numeric Data.ipynb doesn’t render on the GitHub site, but instead results in an error page.

ArturKlauser

practical-machine-learning-with-python
practical-machine-learning-with-python copied to clipboard

Metadata

Very minor bug in contractions with "ain't"

Ch07 use contractions_dict instead of import CONTRACTION_MAP

Requirements file

Index Error

Easier way to download "en_vectors_web_lg" model in spacy

expand contractions is not working as expected

word.lemma_ will lowercase the token

datasets

Bug-Fix Issue-#29

Chapter 4 feature engineering numeric data notebook broken on GitHub

← Metadata

Owner

Metadata

practical-machine-learning-with-python practical-machine-learning-with-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

practical-machine-learning-with-python
practical-machine-learning-with-python copied to clipboard