Machine learning for molecular property prediction
The process of developing a drug from the original idea to its commercialization is a long and costly process, averaging from 12 to 15 years and costing up to billions of dollars. This is due to the experimental testing for a large number of compounds to identify ones that possess suitable characteristics to produce the drug. A possible solution to this issue is predictive modeling methods. The goal of this field is to reduce the need to synthesize large number of compounds during the drug development process. Since this field is relatively new, there is still a need for good benchmarks and comparisons between techniques. This project attempts to reproduce and benchmark different methods across different datasets. Our work involves translating molecules into mathematical representations and applying different machine learning algorithms to recognize patterns between those representations and their properties. The results indicate that our models perform on-par with other state-of-the-art machine learning approaches and set groundwork for future study. If we can continue to optimize the performance of these models, we will be able to profile molecules and efficiently obtain their properties in the future without having to physically synthesize them in a laboratory.