A machine learning workflow for molecular analysis: application to melting points

Sivaraman, Ganesh and Jackson, Nicholas E and Sanchez-Lengeling, Benjamin and Vázquez-Mayagoitia, Álvaro and Aspuru-Guzik, Alán and Vishwanath, Venkatram and de Pablo, Juan J (2020) A machine learning workflow for molecular analysis: application to melting points. Machine Learning: Science and Technology, 1 (2). 025015. ISSN 2632-2153

[thumbnail of Sivaraman_2020_Mach._Learn.__Sci._Technol._1_025015.pdf] Text
Sivaraman_2020_Mach._Learn.__Sci._Technol._1_025015.pdf - Published Version

Download (1MB)

Abstract

Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.

Item Type: Article
Subjects: Science Global Plos > Multidisciplinary
Depositing User: Unnamed user with email support@science.globalplos.com
Date Deposited: 30 Jun 2023 04:47
Last Modified: 10 Nov 2023 05:32
URI: http://ebooks.manu2sent.com/id/eprint/1259

Actions (login required)

View Item
View Item