Preliminary study: Exploring GitHub repository metrics

GitHub is the largest platform for code storage and
development and is currently the source of data on software
projects. It is important to understand how repository properties affect code quality, project popularity, and find dependencies between different repository metrics. In this paper, a qualitative and quantitative analysis of more than 700 repositories and 81 metrics was conducted. Descriptive statistics, statistical tests, and correlation analysis were investigated. An analysis of the resulting descriptive statistics was conducted. The correlation analysis
highlighted strongly correlated metrics and provided a theoretical justification for the dependencies obtained. In addition, clustering of repositories was performed and discussion of obtained groups
of repositories is presented.

Guzel Safiullina (Innopolis University,

Aidar Gumerov (Innopolis University,

Gcinizwe Dlamini (Innopolis University,

Giancarlo Succi (Innopolis University,

in Proceedings of the Third International Conference Nonlinearity,Information and Robotics 2022, August 24, 2022