The contemporary information landscape is characterised by a huge amount of data available for analysis using a variety of research tools and methods. Considering the limitations of using individual models and methods, it is worth employing an approach that combines functional and logical autoregression methods to conduct a more accurate analysis of trends and topics in the information space. Considering this context, this work aims to develop an algorithm to identify and analyse topics that would be relevant in the future using autoregression methods. The process begins with the quantification and normalisation of data, which significantly affect the quality of analysis. The main focus of this study is to implement the autoregression method to analyse long-term trends and predict future developments in the selected data. The proposed algorithm evaluates the forecast of these future developments and analyses graphical trends, thus conducting a more detailed study and modelling of future data dynamics. The regression coefficient is used as a quality criterion. The algorithm concludes with a polynomial function to help identify topics that will be relevant in the future. Overall, the proposed algorithm can be considered an effective tool for analysing and predicting future trends based on the analysis of historical data, thus contributing to the identification of prospects for technological development.
Идентификаторы и классификаторы
The rapid growth in the volume and diversity of data has increased the importance of data analysis and interpretation. In this context, forecasting of future trends is gradually becoming a key aspect for various industries, such as the marketing, finance and technology sectors. Notably, this process involves analysing current data and applying various methods to predict future events.
Список литературы
1. Aizawa, A., 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management, 39(1), 45-65. DOI: 10.1016/S0306-4573(02)00021-3
2. Boban, I., Doko, A., & Gotovac, S., 2020. Sentence retrieval using Stemming and Lemmatization with different length of the queries. Advances in Science, Technology and Engineering Systems, 5(3). DOI: 10.25046/aj050345 EDN: VXJKXQ
3. Choi, J., & Lee, S. W., 2020. Improving FastText with inverse document frequency of subwords. Pattern Recognition Letters, 133. DOI: 10.1016/j.patrec.2020.03.003 EDN: UJMUQJ
4. Cover, T. M., & Thomas, J. A., 2005. Elements of Information Theory. In Elements of Information Theory. John Wiley and Sons. DOI: 10.1002/047174882X EDN: SSWPAV
5. Dagdelen, J., Dunn, A., Lee, S., Walker, N., Rosen, A. S., Ceder, G., Persson, K. A., & Jain, A., 2024. Structured information extraction from scientific text with large language models. Nature Communications, 15(1). DOI: 10.1038/S41467-024-45563-X EDN: UEKRXN
6. Dey, R. K., & Das, A. K., 2023. Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimedia Tools and Applications, 82(21). DOI: 10.1007/s11042-023-14653-1
7. Di, Y., Zhang, Y., Zhang, L., Tao, T., & Lu, H., 2017. MdFDIA: A Mass Defect Based Four-Plex Data-Independent Acquisition Strategy for Proteome Quantification. Analytical Chemistry, 89(19), 10248-10255. DOI: 10.1021/acs.analchem.7b01635
8. Friedman, R., 2023. Tokenization in the Theory of Knowledge. Encyclopedia, 3(1). DOI: 10.3390/encyclopedia3010024
9. Gandhi, A. B., Joshi, J. B., Kulkarni, A. A., Jayaraman, V. K., & Kulkarni, B. D., 2008. SVR-based prediction of point gas hold-up for bubble column reactor through recurrence quantification analysis of LDA time-series. International Journal of Multiphase Flow, 34(12), 1099-1107. DOI: 10.1016/j.ijmultiphaseflow.2008.07.001
10. Huang, G. bin, Zhou, H., Ding, X., & Zhang, R., 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(2), 513-529. DOI: 10.1109/TSMCB.2011.2168604
11. Huang, Q., Zhang, H., Chen, J., & He, M., 2017. Quantile Regression Models and Their Applications: A Review. Journal of Biometrics & Biostatistics, 08(03). DOI: 10.4172/2155-6180.1000354
12. Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L., 2019. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11). DOI: 10.1007/s11042-018-6894-4
13. Mahmoud, H. A. H., Hafez, A. M., & Alabdulkreem, E., 2023. Language-Independent Text Tokenization Using Unsupervised Deep Learning. Intelligent Automation and Soft Computing, 35(1). DOI: 10.32604/iasc.2023.026235 EDN: HWALDD
14. Mestre, G., Portela, J., Rice, G., Muñoz San Roque, A., & Alonso, E., 2021. Functional time series model identification and diagnosis by means of auto- and partial autocorrelation analysis. Computational Statistics & Data Analysis, 155, 107108. DOI: 10.1016/J.CSDA.2020.107108 EDN: JQSQTX
15. Minogue, C. E., Hebert, A. S., Rensvold, J. W., Westphall, M. S., Pagliarini, D. J., & Coon, J. J., 2015. Multiplexed quantification for data-independent acquisition. Analytical Chemistry, 87(5), 2570-2575. DOI: 10.1021/AC503593D
16. Ozturkmenoglu, O., & Alpkocak, A., 2012. Comparison of different lemmatization approaches for information retrieval on Turkish text collection. International Symposium on Innovations in Intelligent SysTems and Applications. DOI: 10.1109/INISTA.2012.6246934
17. Peng, H., Long, F., & Ding, C., 2005. Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226-1238. DOI: 10.1109/TPAMI.2005.159
18. Shantal, M., Othman, Z., & Bakar, A. A., 2023. A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min-Max Normalization. Symmetry, 15(12). DOI: 10.3390/sym15122185 EDN: FWKAWN
19. Singh, D., & Singh, B., 2020. Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97. DOI: 10.1016/j.asoc.2019.105524 EDN: SZRRRM
20. Toporkov, O., & Agerri, R., 2023. On the Role of Morphological Information for Contextual Lemmatization. Computational Linguistics, 50(1). DOI: 10.1162/coli_a_00497
21. Trewartha, A., Walker, N., Huo, H., Lee, S., Cruse, K., Dagdelen, J., Dunn, A., Persson, K. A., Ceder, G., & Jain, A., 2022. Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science. Patterns (New York, N.Y.), 3(4). DOI: 10.1016/J.PATTER.2022.100488 EDN: INZSMY
22. Zhang, B., Kä, L., & Zubarev, R. A., 2016. DeMix-Q: Quantification-Centered Data Processing Workflow. Molecular & Cellular Proteomics: MCP, 15(4), 1467-1478. DOI: 10.1074/MCP.O115.055475
23. Zhang, W., Wang, Q., Kong, X., Xiong, J., Ni, S., Cao, D., Niu, B., Chen, M., Li, Y., Zhang, R., Wang, Y., Zhang, L., Li, X., Xiong, Z., Shi, Q., Huang, Z., Fu, Z., & Zheng, M., 2024. Fine-tuning large language models for chemical text mining. Chemical Science, 15(27), 10600-10611. DOI: 10.1039/d4sc00924j EDN: UCJEJA
24. Zhang, Z., Lei, Y., Xu, J., Mao, X., & Chang, X., 2019. TFIDF-FL: Localizing faults using term frequency-inverse document frequency and deep learning. IEICE Transactions on Information and Systems, E102D(9). DOI: 10.1587/transinf.2018EDL8237
25. Кадиев, И. П., & Кадиев, П. А., 2016. Однородные регистровые среды с программируемой структурой. Вестник Дагестанского Государственного Технического Университета. Технические Науки, 35(4), 108-112. DOI: 10.21822/2073-6185-2014-35-4-108-112
26. Пучков, Е. В., Puchkov, E. v., Белявский, Г. И., & Belyavsky, G. I., 2018. Применение локальных трендов для предподготовки временных рядов в задачах прогнозирования. Международный Журнал Программные Продукты и Системы, 29, 751-756. DOI: 10.15827/0236-235X.124.751-756
27. Савзиханова, С.А., 2023, Big Data - выигрышная инновация для прогнозирования будущих тенденций. УЭПС: управление, экономика, политика, социология, 69-75. DOI: 10.24412/2412-2025-2023-2-69-76
Выпуск
Другие статьи выпуска
Innovation processes are strongly in uenced by changes in economic, political, technological and other external factors. For instance, economic instability and political uncertainty can both stimulate and limit innovative activity in organisations. Transmodern innovation is a concept that involves scienti c and technological advancements that may remain unutilised until favourable changes occur in technological or economic conditions. The purpose of this study is to develop a conceptual model for transmodern innovation that takes into account the dynamics of innovation, including the intensity, economic prerequisites, external changes and degree of innovation adaptation. This model will help organisations to better understand and respond to the complexities of the innovation process. The resulting model is a comprehensive tool for analysing changes in innovation activity and the external environment over di erent time phases, including the initial state (t0), the transition to new conditions (t1) and the nal state (tx). In this model, the ‘Final stage of tx’ block represents the nal stage, which allows us to draw conclusions about the success of adaptation and innovation development. This is the basis for formulating strategic conclusions and recommendations for future development.
This article analyses the sustainability of China’s economic growth in light of global challenges, focusing on macroeconomic changes in recent decades and their impact on the country’s economy. The study covers the period 1962-2022 and uses data from various sources, including the World Bank, International Monetary Fund, Organisation for Economic Cooperation and Development, and national statistical data from the People’s Republic of China. Correlation analysis methods are used to assess the impact of socio-economic indicators on economic growth, revealing signi cant correlations between gross domestic product and various indicators such as external debt, urbanisation, technological development, and the standard of living. The main conclusion of the analysis is that economic diversi cation and investment in high-tech industries are crucial for maintaining sustainable growth in China. The ndings indicate the need for future research assessing the potential for reducing the environmental impact of industrialisation and improving social policies in a changing global economy.
This article explores the integration of digital solutions to enhance the sustainable development of agribusiness through the activation of the introduction of intellectual capital. The analysis is carried out taking into account various factors affecting yields, such as soil type, fertilizer use, market prices, employee education level, product demand, and automation level. The level of automation, the use of geographic information systems, access to big data, and hours of employee training were chosen as factors of intellectualization. Random forest, ARIMA, SARIMA, and LSTM models were used to predict yields. The data were taken from the statistical portals of Armenia and Georgia (137 observations). The results of the study show that the LSTM model demonstrated the best prediction accuracy with an average absolute error of 8.30 and a standard error of 102.47. The random forest model showed an average absolute error of 24.87 and a standard error of 828.23, while the ARIMA and SARIMA models did not show significant results. The study revealed significant correlations between digital solutions characterizing the level of intellectual capital in agricultural enterprises and agricultural land productivity, including the level of automation and access to big data. Analysis was also conducted on the impact of intellectual capital on the sustainability of agribusiness, including the impact of the level of education and training hours of employees. It is concluded that the integration of innovative technologies, such as big data and automation, contributes to improving the efficiency of agricultural production.
This article analyses the sustainability of the agro-industrial complex (AIC) in the Eurasian Economic Union (EAEU) countries with an emphasis on food security. The study covers challenges and threats to food security in Russia, Belarus, Armenia, Kazakhstan, and Kyrgyzstan, given the difficult geopolitical situation. The article examines data from the national statistical services of the EAEU countries, as well as international sources such as the FAO and the World Bank. Correlation and cluster analysis approaches are applied to assess the impact of socioeconomic indicators on the sustainability of the AIC. Significant correlations between indicators of food security and such factors as the volume of agricultural production, investments in the agricultural sector, the level of technological development, and government support are revealed. On average, for the period from 2015 to 2022, the added value of agriculture amounted to 8.2% of GDP, and the food production index was 104.1. The results of the cluster analysis showed that the EAEU countries can be grouped by levels of agricultural development and food security. Thus, K-means and GMM identified three clusters in which Russia found itself both in a separate cluster and in combination with other countries. Agglomerative and spectral clustering also showed similar results, distinguishing three main groups of countries. The average silhouette coefficient for agglomerative and spectral clustering was 0.41, which indicates a better clustering quality compared to K-means and GMM (0.38). It is confirmed that integration and coordination of efforts within the EAEU, as well as diversification of agricultural production and increased investment in innovation, determine the state of sustainability of the agro-industrial complex.
Издательство
- Издательство
- СПбПУ
- Регион
- Россия, Санкт-Петербург
- Почтовый адрес
- 195251, г.Санкт-Петербург, ул. Политехническая, дом 29
- Юр. адрес
- 195251, г.Санкт-Петербург, ул. Политехническая, дом 29
- ФИО
- Рудской Андрей Иванович (Ректор)
- E-mail адрес
- office@spbstu.ru
- Контактный телефон
- +7 (812) 2972077
- Сайт
- https://spbstu.ru