natural icon indicating copy to clipboard operation
natural copied to clipboard

Support for german stemmer

Open thomasfr opened this issue 12 years ago • 8 comments

Would be great to have a stemmer for the german language. Maybe this is a good starting point? https://gist.github.com/2199965

thomasfr avatar Mar 25 '12 21:03 thomasfr

i agree! one of my highest priorities for natural before fall 2012 is non-English stemmers. i personally was going to look into doing French as I can likely handle that completely, but was hoping to get native speakers to help me at least verify my work with other languages.

would you either be able to handle either the implementation or at least help me verify its accuracy?

the algorithm you've attached, have you played with it much? are you aware if there are any licensing restrictions with it?

chrisumbel avatar Mar 25 '12 22:03 chrisumbel

Hi, I could try to do a simple base implementation based on top of the Gist i provided. but i have to check the license first. Otherwise i can i help you in testing yours.

But great to hear that this is on your top priorities list. :)

thomasfr avatar Mar 26 '12 06:03 thomasfr

Feel free to take a stab at it!

chrisumbel avatar Mar 27 '12 10:03 chrisumbel

oops! i did not mean to close this.

chrisumbel avatar Mar 27 '12 11:03 chrisumbel

+1 for Dutch stemming. Hopefully I can help out in some sort of way in the future.

alfredwesterveld avatar Jul 26 '12 08:07 alfredwesterveld

You can use the JS Snowball port to do so:

https://github.com/fortnightlabs/snowball-js

It does change the capital letter U to lowercase though: http://code.google.com/p/urim/issues/detail?id=3

joscha avatar Sep 05 '12 07:09 joscha

Added Porter Stemmer for Dutch. I should say that the Porter algorithm makes mistakes in Dutch and that my implementation fails in 305 cases of 45669 in the snowball file. That is less than 1% failure. Also the Snowball file contains wrong examples; for instance afvalstortplaats is stemmed as afvalstortplat, which is wrong, it should be afvalstortplaats.

Hugo

Hugo-ter-Doest avatar Apr 07 '18 19:04 Hugo-ter-Doest

News?

webia1 avatar Nov 12 '20 18:11 webia1

Hi, I could try to do a simple base implementation based on top of the Gist i provided. but i have to check the license first.

I checked the license:

/*
 * Original author: Joder Illi
 * 
 * Copyright (c) 2010, FormBlitz AG
 * All rights reserved.
 * Implementation of the stemming algorithm from http://snowball.tartarus.org/algorithms/german/stemmer.html
 * Copyright of the algorithm is: Copyright (c) 2001, Dr Martin Porter and can be found at http://snowball.tartarus.org/license.php
 *
 * Redistribution and use in source and binary forms, with or without 
 * modification, is covered by the standard BSD license. 
 * 
 */

As I see it BSD licensed code can be integrated in a MIT licensed code base as long as the the added code has the original (BSD) license.

Hugo-ter-Doest avatar Jan 01 '23 15:01 Hugo-ter-Doest

I am also considering jsSnowball transpiled from Java sources. It is licensed with BSD 3.0 which can be combined with MIT license as well.

Source can be found here: https://github.com/mazko/jssnowball

Hugo-ter-Doest avatar Jan 01 '23 20:01 Hugo-ter-Doest

See #663 for progress

Hugo-ter-Doest avatar Jan 02 '23 08:01 Hugo-ter-Doest

#663 is merged.

Hugo-ter-Doest avatar Jan 02 '23 12:01 Hugo-ter-Doest