Dynamic FM-index

Paolo Ferragina and Giovanni Manzini introduced the FM-index in 2000. It is a full-text compressed index meaning that the space requirement of the index is related to the compressibility of the text (ie. related to the k-th order empirical entropy of the text). Using this index, one can count the number of occurrences in optimal time and locate an occurrence in time depending on a sampling factor. The FM-index concept does not rely on any specific structure. Thus several implementations exist, based on different structures (see Pizza&Chili corpus).

In 2006, Veli Mškinen and Gonzalo Navarro proposed a solution for the text collection problem, where texts can be added or removed from a collection indexed by a FM-index. Wolfgang Gerlach implemented their method, being the first known implementation of a dynamic compressed full-text index.

In 2009, MikaŽl Salson, Thierry Lecroq, Martine Lťonard and Laurent Mouchard published a paper for updating a FM-index when any modification occurs in the text. On this site, you'll find an implementation, based on Gerlach's, of this method.

If you want to test the dynamic FM-index implementation, you need to download the source code. If you want to contribute to the code, please go to the project page or to the SVN repository.

For any information, feel free to contact me at mikael.salson (at) lifl.fr.