Turkish-lemmatizer by baturman

Introduction

Turkish Lemmatizer is used for finding stem/root form of Turkish words

In morphologically complex languages such as Turkish, the stemming process is a difficult task. Therefore, usage of a lemmatizer is a wise solution. The approach that we take during lemmatization process is to use a predefined stem list and pick the one that has the longest matched from the beginning of word.

Turkish Lemmatizer uses "Longest Matched Stemming" algorithm. But besides, it handles some of the turkish word formation that is commonly seen in turkish words.

In this lemmatization library, lemmatization process performance totally related with the supplied stem list.

Obtaining Library

if you want to get source, you can get latest snapshot by clicking download buttons on top of the page.
You can download a specific release from release section on github.
After you extract compressed file you will have directory structure similar to following:

turkish-lemmatizer-v0.0.2/
├── lib
│   └── turkish-lemmatizer-0.0.2.jar
├── LICENSE
└── README.md

Put jar file included in lib/ folder to your project's build path.
You are done.

Lemmatizer Usage

For usage please visit wiki page.

Personal Request from developer

If you use this library in a scientific project, please provide feedback. Such feedbacks can be used to improve algorithm used in the lemmatizer.

License

This project has been licensed under Apache License v2.0

Support or Contact

Having trouble with lemmatizer? you may email your problems to me baturman (at) gmail.com. If you find issue, please report to issues section in the github project.