nltk_data icon indicating copy to clipboard operation
nltk_data copied to clipboard

Move the pickles to a special collection

Open ekaf opened this issue 1 year ago • 7 comments

Now that alternative data packages are available for all the pickles, the question arises: what to do with the old packages?

Simply removing them seems very unsafe for those users who are stuck with an old NLTK version which they cannot upgrade, because they would be forced to look elsewhere to get those packages from dubious sources.

So, what about moving them to a special collection, named for ex. "Pickles"?

ekaf avatar Jul 22 '24 11:07 ekaf

means old nltk data will renamed as something else and latest one we can remain using punkt to download instead using punkt_tab? cause we saw some other modules underlying using old nltk lib. if they didnt move forward, they will still using old nltk version that download the data via punkt instead punkt_tab

hteeyeoh avatar Aug 19 '24 07:08 hteeyeoh

@hteeyeoh , the collections are xml files that provide thematic lists of nltk packages. I am proposing to move the pickles to a new list, while keeping their current https address, so that nothing breaks.

ekaf avatar Aug 19 '24 17:08 ekaf

i see. So this means that for modules that did not upgrade nltk version they can still use punkt lib without triggering the security scan?

hteeyeoh avatar Aug 20 '24 00:08 hteeyeoh

Yes, @hteeyeoh, one purpose of this issue is to discuss how to handle the case when users cannot upgrade to a newer NLTK version.

ekaf avatar Aug 20 '24 05:08 ekaf

Thanks. May I know when can we have this ready?

hteeyeoh avatar Aug 21 '24 01:08 hteeyeoh

@hteeyeoh this is not a time-critical issue, so no promises. I suggest you use whichever punkt package you need.

stevenbird avatar Aug 21 '24 02:08 stevenbird

Hi @stevenbird , Ya understand that. Thanks

hteeyeoh avatar Aug 21 '24 02:08 hteeyeoh