AI To Find New Chemical Reactions In Data Archives
Moscow, 20 March 2025
In a joint project between the Zelinsky Institute of Organic Chemistry (Russian Academy of Sciences) and Skoltech, a research group led by RAS Academician Valentin Ananikov has developed a unique machine learning-based search engine for analyzing vast amounts of high-resolution mass spectrometry data. Machine learning allows exploring terabytes of accumulated data without new experiments. The algorithm accelerates the search for new compounds, reduces costs, and makes research more environmentally friendly. The article was published in the Nature Communications journal.
In a typical laboratory, terabytes of data accumulate over several years, for example, during experimental measurements of high-resolution mass spectrometry. But due to the limitations of manual analysis, scientists consider only a small part of the information. Up to 95% of the accumulated data remains unexplored, which leads to the loss of potentially important discoveries. It would take hundreds of years to manually process such a large amount of information, but new AI-based algorithms can conduct the analysis in just a few days.
“Our work is based on an innovative algorithm combining machine learning and analysis of signal distribution in mass spectra, which has significantly reduced false positives when identifying chemical compounds. The new search algorithm has successfully verified historical data on the Mizoroki-Heck reaction and revealed not only already known, but also completely new chemical transformations, including a unique process of cross-combination that has not been previously documented in the scientific literature,” commented Valentin Ananikov, the scientific supervisor of the study.
During organic synthesis, chemists select specific experimental conditions to optimize the reaction and achieve maximum results. After the reaction and sample preparation, the chemical composition is determined and characterized by an analytical system. High-resolution mass spectrometry is often used to implement this strategy due to its high speed of analysis, sensitivity, and easy data accumulation. The method is widely used in analytical chemistry, organic and inorganic chemistry, proteomics, metabolomics, materials science, as well as in many other fields.
The new solution opens up new possibilities in chemical research. The search engine is capable of analyzing data from different fields of chemistry, leading to the discovery of new reactions, catalysts, and mechanisms. The use of existing data not only accelerates scientific progress, but also reduces experiment cost, making science more environmentally friendly.
The study was carried out at the Zelinsky Institute of Organic Chemistry of the Russian Academy of Sciences and at the Skoltech Energy Center.
About Skoltech
Skoltech is a private international university in Russia, cultivating a new generation of leaders in technology, science, and business. As a factory of technologies, it conducts research in breakthrough fields and promotes technological innovation to solve critical problems that face Russia and the world. Skoltech focuses on six priority areas: life sciences, health, and agro; telecommunications, photonics, and quantum technologies; artificial intelligence; advanced materials and engineering; energy efficiency and the energy transition; and advanced studies. Established in 2011 in collaboration with the Massachusetts Institute of Technology (MIT), Skoltech was listed among the world’s top 100 young universities by the Nature Index in its both editions (2019, 2021). On Research.com, the Institute ranks as Russian university No. 2 overall and No. 1 for genetics and materials science. In the recent SCImago Institutions Rankings, Skoltech placed first nationwide for computer science. Website:
https://www.skoltech.ru/