Turkish Lemmatizer is used for finding root form of Turkish words.
Turkish Lemmatizer is used for finding stem/root form of Turkish words
In morphologically complex languages such as Turkish, the stemming process is a difficult task. Therefore, usage of a lemmatizer is a wise solution. The approach that we take during lemmatization process is to use a predefined stem list and pick the one that has the longest matched from the beginning of word.
Turkish Lemmatizer uses "Longest Matched Stemming" algorithm. But besides, it handles some of the turkish word formation that is commonly seen in turkish words.
In this lemmatization library, lemmatization process performance totally related with the supplied stem list.
turkish-lemmatizer-v0.0.2/
├── lib
│ └── turkish-lemmatizer-0.0.2.jar
├── LICENSE
└── README.md
For usage please visit wiki page.
If you use this library in a scientific project, please provide feedback. Such feedbacks can be used to improve algorithm used in the lemmatizer.
This project has been licensed under Apache License v2.0
Having trouble with lemmatizer? you may email your problems to me baturman (at) gmail.com. If you find issue, please report to issues section in the github project.