JVnSegmenter: A Java-based Vietnamese Word Segmentation Tool


Copyright (c) 2006 - 2007 by

Cam-Tu Nguyen (ncamtu at gmail dot com), College of Technology, Vietnam National University, Hanoi

Xuan-Hieu Phan (pxhieu at gmail dot com), Graduate School of Information Sciences, Tohoku University

JVnSegmenter is a Java-based and open-source Vietnamese word segmentation tool. The segmentation model in this tool was trained on about 8,000 labeled Vietnamese text sentences using conditional random fields (FlexCRFs). Refer to our paper at PACLIC 2006 for more information. This tool would be useful for Vietnamese NLP community. We highly appreciate any bug report, comment, and suggestion that help to fix errors and improve the segmentation accuracy.


Related links:

Researches using this tool for running experiments should include the following citation:

Cam-Tu Nguyen and Xuan-Hieu Phan, "JVnSegmenter: A Java-based Vietnamese Word Segmentation Tool",, 2007.

We would like to thank Trung-Kien Nguyen for spending a lot of time to annotate the training data. We would also like to thank for hosting this project.

Last updated: March 24, 2007