Extended-Connectivity Fingerprints (ECFPs) are refined to predict polymer properties.
Original ECFPs have been circular topological fingerprints designed for substructure and similarity search, as well as for structure-activity model, for finite molecules . Indeed, ECFPs have been successfully applied to cheminformatics . However, their applications to polymer informatics have been yet limited, though they are demanded in chemical industry.
In this study, we develop a new type of polymer descriptor based on ECFPs. Number densities, that is, the substructure numbers divided by the number of atoms in a polymer repeat unit, are employed. We found that this approach is superior in accurately predicting the properties of infinite linear polymers, compared to the conventional approach, where just the substructure numbers are used as descriptors. In addition, feature selection using Least Absolute Selection and Shrinkage Operator (LASSO) regression is found to improve prediction accuracy by eliminating insignificant variables. As a result, the novel descriptor based on ECFPs with machine learning approaches achieve accurate prediction comparable to the prediction of refractive index by ab-initio density functional theory for infinite linear polymer . The results of other properties such as glass transition temperature are also discussed.
 Rogers, D., Hahn, M. J. Chem. Inf. Model. 2010, 50, 742-754.
 Duvenaudy, D., et. al., arXiv:1509.09292v2.
 Maekawa, S., Moorthi, K. J. Phys. Chem. B 2016, 120, 2507-2516.