This article is a summary of the scientific work published by Sergii Poznyak, Yurii Kolyada
The original source can be accessed via DOI 10.33111/nfmte.2023.067
In a world of increasing economic interdependence and geopolitical volatility, modelling economic growth has never been more vital. Traditional forecasting methods often struggle with the complexity and nonlinearity of modern economies, where countless interrelated factors—such as innovation, infrastructure, trade, and technology—shape national development.
This study highlights how machine learning, and particularly clustering techniques, can revolutionize the way we analyze and understand economic growth. By grouping countries with similar macroeconomic characteristics, clustering allows for the identification of shared development trajectories and strategic insights.
The research combines theory and application: reviewing literature on economic growth modelling, assessing advanced AI tools like dimensionality reduction and clustering, and ultimately applying these techniques to real-world macroeconomic datasets. The goal is not only to map countries’ economic paths more precisely but also to benchmark and improve the performance of different clustering models using specialized metrics.
The result? A data-driven, scalable approach to understanding how nations grow—enabling policymakers, investors, and global organizations to develop more targeted, effective economic strategies.
Economic growth has been at the heart of global development since the earliest trade exchanges. Yet, in today’s competitive and interconnected world, the urgency to understand and forecast it has intensified. Traditional models—from the foundational Solow-Swan to the Romer model—have provided valuable insights by quantifying capital, labor, consumption, and technological change. However, these models are often constrained by assumptions (like closed economies or fixed savings-investment equality) that limit their real-world applicability.
In parallel, cyclical theories of economic growth have emerged—such as Kitchin, Juglar, Kuznets, and Kondratieff cycles—offering a temporal lens through which to understand structural changes, innovation bursts, and geopolitical shifts. While informative, these cycles don't always translate easily into actionable policy or strategy.
With the advent of computer-based modeling, economic research has entered a new phase—embracing artificial intelligence and machine learning to handle the complexity and nonlinearity of modern economies. One of the most promising tools in this transformation is cluster analysis, which groups countries by behavioral or structural similarities.
A wide array of algorithms—ranging from K-means and Spectral Clustering to DBSCAN and HDBSCAN—offer powerful clustering capabilities. However, much of the current research still relies heavily on subjective judgment to select the number of clusters or the appropriate algorithm. This undermines the potential of clustering as a truly objective decision-making tool.
What’s missing in many studies is a standardized, data-driven methodology for evaluating clustering performance—especially in the presence of outliers and noise, which are common in country-level economic data. Our study addresses this gap by integrating comprehensive clustering metrics and dimensionality reduction techniques to create a more reliable, transparent, and justifiable framework for modeling economic growth across nations.
As economic data becomes increasingly complex and abundant, cluster analysis is gaining recognition as a transformative tool in modern economic research and strategy. Unlike traditional methods that often rely on linear comparisons or fixed classifications, cluster analysis allows for the grouping of countries, regions, or enterprises into natural, data-driven subgroups based on shared characteristics. This analytical approach brings to light hidden patterns and structures in data, enabling a much deeper understanding of economic behaviors, similarities, and divergences across different systems.
In the realm of economic growth research, the value of cluster analysis is especially pronounced. It offers a lens through which analysts can examine similarities in development trajectories, identify common bottlenecks or accelerators of progress, and evaluate strategic differences in policy implementation across comparable economic environments. Rather than generalizing across vastly different contexts, researchers can now investigate how specific groups of economies respond to similar structural challenges or global trends, leading to more targeted insights and better-informed decisions.
However, the effectiveness of cluster analysis depends heavily on the preparatory stages that precede it. A sound analytical process begins with standardization, where all economic indicators are adjusted to a common scale. This step is essential because raw economic data often contains variables measured in very different units and magnitudes. Without standardization, certain variables—such as GDP or trade volume—might dominate the analysis purely due to their numerical size, distorting the outcome.
Equally crucial is dimensionality reduction. In a dataset filled with dozens or hundreds of indicators, not all inputs contribute equally to identifying meaningful clusters. Dimensionality reduction techniques help streamline the data by retaining only the most relevant and independent features. This serves a dual purpose: it lightens the computational load of clustering algorithms and improves the clarity and reliability of the insights produced. It also addresses the problem of multicollinearity, a condition where indicators are overly interdependent and can confuse or destabilize the clustering process.
By reducing the noise and focusing the data on what matters most, dimensionality reduction lays the groundwork for accurate and interpretable clustering. In turn, this allows policymakers, researchers, and investors to explore economic systems at a level of granularity and relevance that traditional models struggle to achieve. Cluster analysis, therefore, is more than a statistical method—it is a strategic framework for decoding the complexity of global development and crafting interventions that are both efficient and adaptive.
Before countries can be accurately grouped into economic clusters, the data describing them must be carefully refined. In complex economic datasets, hundreds of variables—ranging from GDP and trade balances to innovation indices and debt ratios—can overlap, contradict, or dilute the signal that decision-makers need. The process of dimensionality reduction solves this problem by condensing large datasets into their most informative components, making subsequent analysis more effective and interpretable.
This study benchmarked a wide selection of dimensionality reduction algorithms frequently cited in scientific literature. These included classical linear methods like Principal Component Analysis (PCA), its scalable variant Incremental PCA, and its more selective form, Sparse PCA. Also tested were nonlinear and kernel-based approaches such as Kernel PCA, as well as advanced mathematical tools like Singular Value Decomposition (SVD). Complementary methods like Sparse Random Projection, Independent Component Analysis (ICA), ISOMAP, Multidimensional Scaling (MDS), Local Linear Embedding (LLE), and t-SNE were also examined.
Each of these methods approaches the task differently. PCA, for instance, seeks orthogonal vectors that explain the maximum variance in the data, making it highly effective for datasets with linear relationships. Kernel PCA extends this by capturing complex, nonlinear structures using mathematical transformations. Meanwhile, SVD breaks down the data into fundamental matrix components to highlight its structural essence—often useful in recommendation systems and high-dimensional applications. Methods like ISOMAP and t-SNE are designed to preserve more nuanced data structures by retaining geodesic or neighborhood distances, proving particularly valuable when relationships between countries are subtle, nonlinear, and difficult to map using conventional techniques.
However, these techniques vary significantly in interpretability, computational cost, and robustness to outliers or noise. For example, while t-SNE and LLE can reveal highly detailed cluster shapes, they may also introduce artifacts or distortions if not tuned correctly. Sparse methods offer performance advantages for large datasets but often at the cost of oversimplifying the underlying data. Therefore, selecting a dimensionality reduction method is not just a technical choice—it is a strategic one that can influence the validity and utility of all downstream analyses.
To evaluate the effectiveness of each method, the study used cumulative explained variance as a benchmark metric. This measure indicates the proportion of meaningful variance in the original data that is preserved after dimensionality reduction. The higher this value, the more information is retained, which is crucial for maintaining the integrity of subsequent clustering processes. Mathematically, it is defined as the ratio between the sum of eigenvalues of selected principal components and the sum of all eigenvalues in the dataset. This provides a clear, quantifiable signal of how well each algorithm performs in compressing the data without compromising its explanatory power.
Ultimately, dimensionality reduction is more than a preparatory step. It is the backbone of effective cluster analysis. When applied correctly, it transforms a chaotic mass of economic indicators into a lean, coherent structure that reveals the hidden logic of global development. It enables the identification of meaningful clusters that reflect real economic affinities, not just statistical coincidences. As such, mastering these methods is essential for anyone looking to extract strategic insights from complex, multi-dimensional economic data.
With the dataset now streamlined and optimized, the next step in the process focuses on the cluster analysis itself. This involves selecting appropriate clustering algorithms, understanding their strengths and weaknesses, and applying a robust framework for validating the quality and reliability of the resulting clusters.
Choosing the right clustering algorithm for a specific dataset is often seen as part science, part art. While theoretical knowledge and practical experience can help narrow the options, relying solely on intuition can lead to biased or suboptimal results—especially when working with complex economic data across countries. That’s why this study integrates a rigorous, multi-metric evaluation framework that brings objectivity, consistency, and transparency to the selection process.
Three well-established clustering quality metrics form the foundation of this framework: the Davies-Bouldin Index (DBI), the Calinski-Harabasz Index (CH), and the Silhouette Coefficient (SC). Each of these indicators measures different aspects of clustering performance, offering complementary insights into the cohesiveness and separation of the resulting clusters.
The Davies-Bouldin Index evaluates clustering performance based on the compactness of individual clusters and the separation between them. Specifically, it compares the average distance between points within each cluster to the distance between cluster centroids. A lower DBI score signals better-defined, more distinct clusters. It is especially effective at penalizing overlapping or loosely formed clusters—situations that could distort the interpretation of country-level groupings.
The Calinski-Harabasz Index offers a contrast by focusing on variance. It measures the dispersion between clusters relative to the dispersion within clusters. A higher CH score reflects a clustering outcome where the data is well distributed between groups, and each cluster is internally cohesive. This metric excels in capturing how well the clustering structure separates different economic profiles across the dataset.
The Silhouette Coefficient adds a third dimension by evaluating how similar each point is to its own cluster compared to others. With values ranging from -1 to 1, a higher SC value indicates that the data points are strongly associated with their own cluster and poorly matched to others, suggesting that the clusters are both tight and well-separated.
Rather than selecting a single metric or averaging their scores—which would be statistically unsound due to differing units and scales—the study introduces a composite performance indicator: an aggregate clustering quality index. This index combines the standardized scores of all three metrics, adjusting for their mean values and variability across all experiments.
The aggregate index is calculated in a way that rewards clustering configurations where both the Silhouette Coefficient and Calinski-Harabasz Index are high, and the Davies-Bouldin Index is low. The result is a balanced evaluation that integrates both quantitative rigor and directional relevance, ensuring that better clustering results are consistently and fairly recognized across all algorithmic trials and parameter variations.
This methodology allows researchers and policymakers to move beyond fragmented, trial-and-error evaluations of clustering performance. It delivers a data-backed decision-making tool that reveals which algorithms and configurations yield the most meaningful economic groupings—critical for deriving accurate insights from macroeconomic data and for informing development strategies at the country level.
By grounding clustering selection in clear mathematical logic, the study elevates it from subjective preference to strategic science—helping stakeholders choose methods not just because they are popular or familiar, but because they are demonstrably the best for the problem at hand.
At the heart of any clustering analysis lies the data itself. While algorithms and evaluation metrics determine how clusters form and are assessed, it is the selection of variables that shapes what the clusters actually mean. In the context of this study, a wide-ranging set of macroeconomic, demographic, structural, and fiscal indicators was used to capture the multifaceted nature of national economies.
The variable set was curated to ensure both breadth and relevance. Indicators such as average GDP, GDP per capita, and their growth rates offer a traditional yet essential lens on economic scale and momentum. These metrics form the foundation for assessing development level and economic trajectory.
To understand the structural makeup of economies, the study incorporated variables representing the sectoral distribution of GDP—with separate values for agriculture, industry, and services. This enables a more nuanced understanding of how countries differ in terms of industrialization, labor distribution, and production specialization.
Investment dynamics are represented through measures like capital intensity, capital capacity, and capital private rate. These capture how efficiently capital is deployed, how it contributes to output, and the balance between public and private capital in national economies.
Demographic characteristics are also central to this analysis. Variables such as population size, growth, and population asymmetry—which contrasts the share of youth versus the elderly—offer insights into the labor force structure, dependency ratios, and long-term demographic pressures.
Trade, consumption, and savings behavior are captured through indicators like trade rate, trade asymmetry, consumption rate, savings rate, and consumption asymmetry. These allow the clustering process to account for external engagement and internal demand patterns, revealing how countries differ in terms of openness and household economic behavior.
Fiscal and price stability indicators such as tax burden, inflation, expense level, and revenue level give the clustering model insight into a country’s policy environment and financial management. Finally, resource outturn, which tracks rents from natural resources, helps distinguish resource-rich economies from more diversified or industrialized peers.
This diverse yet coherent set of variables ensures that clustering outcomes are rooted in real economic structures and behaviors—not arbitrary or overly narrow representations. It reflects not only how economies perform, but also how they are composed, governed, and exposed to structural forces like demography, trade, and capital formation.
Ultimately, the thoughtful selection of these variables enables the clustering algorithms to capture meaningful differences and similarities between countries. It ensures that the resulting groupings are not only statistically valid, but economically interpretable—creating a powerful foundation for strategic decision-making in policy, development, and investment.
While many of the macroeconomic and demographic indicators selected for the clustering model are available from 1960 onward, real-world constraints limit the feasibility of using the entire time span. Inconsistencies in statistical infrastructure, political instability, varying levels of data privacy, and national differences in recordkeeping create significant gaps in data completeness across countries. To ensure the integrity and reliability of the analysis, the study narrowed its focus to the period from 1991 to 2020.
The year 1991 was selected as the lower bound due to its geopolitical significance—the collapse of the Soviet Union fundamentally reshaped economic governance and data availability across dozens of countries. On the other end, 2020 marks the most recent year for which data coverage remains robust across most indicators and countries. In contrast, 2021 data, while partially available, shows a marked decline in completeness, making it unsuitable for inclusion.
This revised interval—spanning three decades of post-Cold War economic evolution—offers a balance between historical depth and data quality. During this period, the vast majority of countries consistently reported the core indicators necessary for advanced modelling and cluster analysis. The decision is validated by the comparative completeness chart below, which highlights the stark difference in data coverage between the full 1960–2021 period and the final 1991–2020 dataset.
To maintain the integrity of the study while ensuring broad international coverage, a filtering threshold was applied to the dataset. Countries with more than 15% missing data across selected indicators were excluded from the analysis. This refinement left a core group of 150 countries out of a possible 217, representing a diverse and globally relevant sample. As illustrated in the figure below, the excluded entities largely consisted of small island nations, dependent territories, and countries with significant data reporting limitations.
Given the structure of clustering and dimensionality reduction algorithms, which require complete datasets, any remaining gaps needed to be addressed systematically. However, further reducing the country count in pursuit of full completeness would have substantially weakened the analytical base. Therefore, a strategic imputation approach was implemented to fill in the remaining missing values without distorting the integrity of the data.
The method used relies on temporal and peer-group approximation. When a value was missing for a particular period iii, it was replaced by scaling the nearest available value XjX_jXj in a neighboring year jjj, adjusted proportionally to reflect the macro-level trend within the same economic peer group (e.g., "Low-Income Asian Economies"). This ensures that the imputed value not only preserves the trajectory of the specific country but also remains consistent with broader regional and developmental dynamics.
Once the variable set was finalized and data completeness restored, the next critical step involved data preparation—a prerequisite for high-quality clustering. First, standardization was performed to normalize variables measured in different units (e.g., GDP in dollars vs. consumption as a share of GDP), preventing scale discrepancies from skewing the analysis.
Next, dimensionality reduction was applied to address multicollinearity. For example, indicators like GDP per capita and the share of services in GDP are often positively correlated, both signaling a country’s level of development. Without reduction, these relationships could overweight certain characteristics and bias the clustering outcome. The goal was to condense the dataset into a more compact form while retaining the majority of its informational value, enabling faster and more effective processing in subsequent analytical stages.
Through this methodical approach, the study preserves data integrity, ensures global representation, and lays the technical foundation for extracting valid and insightful economic clusters.
With a clean and standardized dataset in place, the next phase involved preparing the data for clustering through dimensionality reduction. Given the high number of indicators and their potential interdependencies, reducing dimensionality is essential to mitigate multicollinearity, accelerate computations, and preserve the structural integrity of the data.
To begin, the dataset was standardized using Z-transformation, a common and reliable technique that centers the mean at zero and normalizes the standard deviation to one. This ensures that all indicators—regardless of their original scale—contribute equally to the analysis.
For dimensionality reduction, the research team rigorously evaluated 11 core algorithms along with 5 customized kernel variants, covering a wide spectrum of linear and nonlinear techniques. These included Principal Component Analysis (PCA), Incremental PCA, Sparse PCA, Kernel PCA (with multiple kernels), Independent Component Analysis (ICA), Multidimensional Scaling (MDS), ISOMAP, t-SNE, and others.
Each method was assessed by its cumulative explained variance—the percentage of original data variance retained after reducing dimensions—which was calculated for every tested number of components, as illustrated below.
Among the tested algorithms, Kernel PCA with a sigmoid kernel emerged as the most effective method. It was able to retain 95.76% of the original information using only 15 principal components, thereby striking an optimal balance between dimensionality reduction and information preservation. While some degree of information loss is inevitable in any reduction process, the generally accepted threshold of tolerance is 5%, which this method satisfies.
With the reduced dataset now primed for clustering, the next step was to refine the choice of clustering algorithms. The goal was twofold: to optimize resource usage and to eliminate methods that are less suited to the structure of the data. Algorithms that rely on automatic estimation of the number of clusters, such as Affinity Propagation and Mean Shift, were excluded from further testing. In this context, the automatic assignment of cluster count led to unreliable outputs due to the scattered distribution and subtle boundaries between economic profiles. Their inability to consistently define meaningful clusters made them unfit for the high-stakes modeling involved in this research.
By narrowing the algorithmic scope and applying a high-fidelity dimensionality reduction method, the study lays a robust technical foundation for the clustering phase—ensuring that the final groupings will be both interpretable and analytically sound.
In an era of growing global complexity, the search for effective methods of comparing national economies has taken on renewed significance. Traditional classifications—such as those based solely on income levels—often fall short in capturing the multifaceted realities of modern development. To move beyond these limitations, this study leverages machine learning, particularly clustering algorithms, to group countries by underlying economic similarities, thereby uncovering latent structures in global economic data.
Cluster analysis proves particularly valuable in this context. By using a broad spectrum of macroeconomic indicators—ranging from production structure and capital intensity to demographic profiles and consumption patterns—it becomes possible to segment countries into meaningful, homogeneous groups. These clusters serve as a foundation for cross-national comparisons, policy benchmarking, and more nuanced development analysis.
To build the dataset, we initially examined 217 countries, narrowing the sample to 150 based on a threshold of data completeness: only countries with at least 85% complete records were included. The selected period, 1991 to 2020, reflects both a high level of statistical coverage and a meaningful historical frame—1991 being a turning point in global politics with the dissolution of the Soviet Union and a shift in many countries’ economic strategies. Since clustering algorithms require complete data, gaps were filled using proportional imputation that leverages temporal and regional economic patterns, following World Bank classification groups.
Before clustering could begin, the data underwent a two-stage preprocessing: standardization and dimensionality reduction. Standardization—performed via Z-score normalization—ensured equal weighting for indicators measured in different units or magnitudes. Dimensionality reduction was then used to address multicollinearity and enhance computational efficiency. This step is critical, as correlated features—such as GDP per capita and services share in GDP—can disproportionately affect cluster structure if left uncorrected.
A total of 16 dimensionality reduction techniques were tested, including 11 base algorithms and 5 kernel modifications. Their performance was evaluated by calculating the cumulative explained variance for each method across an increasing number of components. As shown in Fig. 3, Kernel PCA with a sigmoid kernel emerged as the top performer, preserving 95.76% of the original information in just 15 components. This reduction formed the final input space for clustering.
At the next stage, 14 clustering algorithms were considered, but not all were deemed suitable for the objectives of the study. Since the methodology requires all countries—including Ukraine—to be assigned to a cluster, algorithms that allow for unclustered data points (like DBSCAN, HDBSCAN, and OPTICS) were excluded. Likewise, methods with automatic cluster selection, such as Affinity Propagation and Mean Shift, were omitted due to poor handling of dispersed data distributions and instability in cluster count.
To determine clustering effectiveness, three metrics were employed: the Davies-Bouldin Index (DBI), which penalizes loosely formed or overlapping clusters; the Calinski-Harabasz Index (CH), which favors well-separated and compact clusters; and the Silhouette Coefficient (SC), which evaluates intra-cluster cohesion versus inter-cluster separation. These indicators capture different facets of clustering quality and are not directly additive due to differing scales and directions. Therefore, a standardized composite index was introduced to aggregate the scores into a single performance measure. This index, calculated according to formula (9), rewards configurations with high CH and SC values and low DBI values.
To ensure meaningful segmentation, the number of clusters was restricted to the range of 7 to 20. Fewer clusters would blur distinctions between vastly different economies, while too many would compromise interpretability and produce excessively fragmented results. Clustering was applied to the 150-country dataset using the 15 principal components derived from dimensionality reduction.
The outcome of this multi-step process is a ranked list of the most effective algorithm and cluster number combinations, presented in Table 4. Notably, the K-means algorithm delivered the best results and was the most frequently represented method in the top configurations. It appears six times among the 20 best outcomes, indicating both strong average performance and adaptability across cluster counts. BIRCH, Agglomerative Clustering, and Ward’s Method also performed consistently well, each appearing four times.
The reliability of these methods is further supported by simulation results from a synthetic test dataset that closely mirrors the real-world conditions of this study. These results are visualized in Fig. 4, where the clustering behavior of each algorithm is displayed across various sample types. The simulations confirm the robustness of K-means and its peers, while highlighting the weaknesses of algorithms prone to leaving data unclustered.
The combination of rigorous preprocessing, broad algorithm testing, and a composite evaluation metric has yielded a robust and reproducible method for classifying countries by economic similarity. More than just a technical exercise, this methodology provides a valuable lens for policymakers, economists, and international organizations. It enables smarter grouping, more accurate benchmarking, and clearer insights into economic convergence and divergence—laying the groundwork for data-driven development strategy rather than assumption-driven classification.
Clustering macroeconomic indicators over a 30-year horizon (1991–2020) reveals a major pitfall: long-term averages obscure critical transitions and group countries with little real economic similarity. In the aggregated clustering model, Ukraine appears alongside countries such as Belize, Venezuela, and South Africa—nations with markedly different developmental paths and structural contexts.
This misalignment occurs because extended averaging smooths over key fluctuations—such as crises, reforms, and regime changes—making the results less meaningful for policymaking or strategy development. To overcome this limitation, the researchers divided the overall period into six five-year intervals, applying the same K-means clustering approach to each segment. This allowed for a much more detailed view of economic trajectories and systemic change.
In the first period (1991–1995), Ukraine clustered with other post-Soviet economies experiencing transitional turmoil. This group is defined by high inflation, shrinking GDP, and declining household savings—typical signs of systemic restructuring from a planned to a market economy.
By the next interval (1996–2000), Ukraine shifted to a new cluster shared with countries from Latin America, parts of South-Eastern Europe, and select developing African states. These countries were characterized by fast economic growth driven by low-base effects and the initial gains from liberal market reforms.
This dynamic movement between clusters illustrates a key insight: economic identity is not static, and averaging over long periods risks concealing the very transformations that matter most. For decision-makers and investors, embracing time-sensitive clustering offers a clearer picture of developmental stages, vulnerabilities, and growth opportunities—especially in economies undergoing systemic change.
Continuing the time-segmented clustering analysis, the period from 2001 to 2005 reveals a new phase in Ukraine’s economic evolution. During this interval, Ukraine was grouped with Balkan, Baltic, Caucasian countries, and Kazakhstan—nations undergoing structural reforms and liberalization while increasingly engaging in global markets. This cluster reflects shared priorities such as regulatory adjustment, openness to international trade, and efforts to stabilize economic institutions.
Between 2006 and 2010, Ukraine transitions again—this time aligning with Eastern European countries pursuing European integration, while simultaneously responding to the economic shock of the 2008 global financial crisis. Common cluster characteristics during this period include intensified reforms, coordinated policy responses, and growing alignment with EU economic norms.
In the subsequent period of 2011 to 2015, Ukraine remains within a familiar grouping, this time predominantly alongside Balkan and Baltic states. Despite growing internal political and military pressures, these countries demonstrated resilience through continued structural reforms and fiscal adjustments. The shared experience of adapting to global economic disruptions and regional geopolitical tensions shaped this period’s clustering outcome.
Across these three intervals, Ukraine’s shifting cluster membership illustrates not only its transitional status but also the responsiveness of clustering models when applied to shorter time spans. These snapshots capture how Ukraine’s economic behavior echoed broader regional movements—first liberalization, then integration, and finally stabilization amidst turbulence.
In the most recent interval, 2016 to 2020, Ukraine was part of a cluster dominated by Eastern European countries undergoing a new phase of economic acceleration. This group shared key structural characteristics: moderate inflation, rapid capital accumulation, high human capital, and a dominant service sector. These nations also experienced relatively fast GDP growth compared to developed economies, while coping with demographic challenges such as low or negative population growth.
For Ukraine, this period marked a critical rebound—emerging from the economic and political disruptions caused by the 2014 Russian invasion and reasserting itself through structural reforms. Improved governance, economic liberalization, and strategic investments helped restore macroeconomic stability and align Ukraine more closely with its Eastern European peers.
To derive an overarching perspective, researchers merged data across all six five-year intervals and applied dimensionality reduction followed by re-clustering. This composite view confirmed a strong and consistent alignment between Ukraine and other Eastern European countries across the entire 30-year period.
This cluster was shaped by key regional dynamics: transition from socialism to a market economy, privatization of state-owned enterprises, and an overarching trend of European integration. The EU-driven economic convergence process played a significant role, with countries adapting to common standards, fostering trade, and pursuing political harmonization. For Ukraine, this translated into both tangible reforms and deeper geopolitical positioning.
The final map (Fig. 13) shows that many clusters naturally group neighboring countries, underscoring the influence of geography, historical legacy, and shared economic ecosystems. Countries rich in natural resources often form specialized clusters—such as energy exporters or commodity-driven economies—while regional integration and policy convergence contribute to more uniform economic structures across contiguous states.
The clustering model built on macroeconomic indicators across six five-year periods from 1991 to 2020 reveals deep structural patterns in global development. To improve interpretability, the original 17 data clusters were grouped into broader types based on economic similarity, geopolitical context, and shared development trajectories.
One prominent group comprises countries in Eastern Europe and the post-Soviet space, including Ukraine, Poland, the Baltic states, and the Balkans. These nations—most of which are now EU members or candidates—have undergone a similar transformation: shifting from centrally planned to market-based economies, pursuing European integration, liberalizing trade, and modernizing infrastructure. Ukraine's consistent placement in this group over time underscores its alignment with regional peers striving to adapt to modern global economic systems.
Another major cluster includes advanced economies such as Germany, France, the UK, Japan, and the Nordics. These countries are defined by high living standards, strong innovation ecosystems, diversified industrial bases, and active participation in global financial and trade networks. They represent the global economic core in terms of influence, competitiveness, and resilience.
Resource-driven economies form another identifiable group. Countries like Saudi Arabia, Russia, Qatar, and Canada are clustered together based on their dependence on energy exports—primarily oil and gas—and their significant impact on global commodity markets. While their levels of economic diversification vary, their shared reliance on natural resource revenues shapes both domestic policy and international strategy.
In contrast, a wide swath of lower-income and developing countries from Sub-Saharan Africa, South Asia, and parts of Latin America appears in clusters defined by limited industrial capacity, high agricultural dependency, and persistent poverty and inequality. These countries face structural challenges, but many are pursuing reforms aimed at boosting productivity, enhancing education, and expanding access to global markets.
A small number of high-performing, innovation-driven Asian economies—such as Singapore, South Korea, and Taiwan—form a distinct cluster characterized by advanced infrastructure, strong export orientation, and a central role in technology supply chains. China and India also occupy a unique shared position: as the world’s most populous nations with large domestic markets, they combine rapid growth with rising global influence, driven by investments in infrastructure and digital technology.
Finally, the United States stands alone in its own cluster. As the world’s largest and most diversified economy, it is a global outlier in both scale and technological leadership. Its influence spans every sector from finance to defense to digital innovation, and its economic trajectory shapes global trends more than any other single nation.
This global clustering shows that countries do not evolve in isolation. Trade, geopolitical alignment, economic strategy, and regional integration create networks of similarity. Proximity still matters—but often because of shared policy choices, supply chains, or institutional convergence rather than mere geography. Exceptions exist, particularly where natural resources, political regimes, or demographic scale create unique economic identities.
Ukraine’s place in this landscape reflects a nation still in transition but firmly oriented toward convergence with its Eastern European neighbors. Through consistent reforms and strategic partnerships, it has moved alongside others on a path toward resilience, modernization, and integration with broader European and global systems.
Understanding the dynamics of economic growth in a globalized world requires more than isolated macroeconomic snapshots. Today’s challenges—ranging from structural inequality to uneven regional development—demand a multidimensional lens. Cluster analysis offers just that: a powerful and systematic approach to group countries or regions based on shared economic behaviors, enabling better insights into diversification patterns, regional disparities, policy effectiveness, and cooperation opportunities.
This article applies cluster analysis and dimensionality reduction techniques to a wide set of macroeconomic indicators across 150 countries over a thirty-year period (1991–2020), with a particular emphasis on Ukraine’s evolving economic identity. The goal was to uncover development patterns, identify peer groups, and ultimately deliver actionable insights for strategic economic positioning.
To achieve this, a tailored methodology was developed. The process began with a comprehensive literature review to define the problem space and identify best practices in clustering and dimensionality reduction. A comparative evaluation of clustering methods—including K-means, Agglomerative Clustering, BIRCH, and Ward’s method—was performed, along with various dimensionality reduction techniques. Each approach was assessed not only on algorithmic performance but also on interpretability and economic relevance.
What sets this research apart is its introduction of a robust evaluation framework. Clustering quality was measured using an aggregate index that synthesizes key normalized performance metrics: the Davies-Bouldin Index, the Calinski-Harabasz Index, and the Silhouette Coefficient. Dimensionality reduction techniques were judged by cumulative explained variance. This allowed the authors to balance algorithmic precision with practical usability in real-world economic analysis.
Kernel Principal Component Analysis (KPCA) with a sigmoid kernel emerged as the most effective dimensionality reduction technique. It preserved 95.76% of the information while reducing the data set by 40%. For clustering, the K-means algorithm with 17 clusters delivered the most coherent and meaningful segmentation of countries. This framework was applied not only to the full 30-year data set, but also to six discrete 5-year intervals, enabling dynamic tracking of how economies shifted over time.
In Ukraine’s case, the transformation is especially notable. The country began the post-Soviet era in the early 1990s within a cluster defined by economic instability, high inflation, and systemic transition. Over three decades, despite recurring political and economic crises, Ukraine gradually aligned with Eastern European countries through structural reforms, trade liberalization, and EU-oriented integration. By the final interval (2016–2020), Ukraine consistently appeared in a cluster of rapidly reforming, service-driven economies in Central and Eastern Europe.
More broadly, the results highlight how clusters tend to unite geographically proximate countries—often driven by common historical legacies, trade flows, labor market dynamics, and shared resource endowments. Large countries with abundant natural resources, for instance, often gravitate toward specialized clusters centered on energy or commodity exports. In contrast, small but innovation-driven nations may cluster together due to strong digital infrastructure or policy convergence.
Looking ahead, the research team plans to expand the scope of the methodology by introducing additional clustering algorithms, experimenting with alternative normalization techniques, and increasing the number of indicators. There is also a roadmap to use these clustering outcomes as the basis for predictive models—providing data-driven forecasts of economic growth trajectories within each cluster. The ultimate objective is to turn macroeconomic analysis into a strategic tool for governments, investors, and international institutions seeking to better understand and shape the future of economic development.
References