Exploring dimension reduction techniques for text dataset visualization

Exploring dimension reduction techniques for text
dataset visualization

Large multidimensional data sets are hard to visualize. Most existing methods dedicate visual space to multiple items or multiple features. In this work, we explore dimensionality reduction methods to capture both properties. We show that self-organizing maps (SOM) are the good choice for screen and paper visualization. We involve colors to make multiple texts comparable on a single image. We discuss important properties of our visualization method and propose an optimal parameter set with respect to text vocabulary size. Our methods are implemented in python programming language and are available as an open-source visualization library.

Read the article


Authors:

Dinar Zaiakhov (Innopolis University, d.zayahov@innopolis.university)


Stanislav Protasov (Innopolis University, s.protasov@innopolis.ru)

in Proceedings of the Third International Conference Nonlinearity,Information and Robotics 2022, August 24, 2022