Daniel Davies1 Keith Butler1 Olexandr Isayev2 Aron Walsh3

1, University of Bath, Bath, , United Kingdom
2, University of North Carolina, Chapel Hill, North Carolina, United States
3, Imperial College London, London, , United Kingdom

The discovery of earth abundant, functional materials is critical for sustainable technological advancement. There is a concerted global effort to reduce the time it takes to realize such materials via databases, high-throughput screening, informatics, and mapping out the ‘‘materials genome.’’ But what fraction of theoretical chemical space is represented by the number of known compounds that have been thoroughly characterized to date? Forming a four-component compound from the first 103 elements results in excess of 1012 potential combinations. Such a search space is intractable to high-throughput experiment or first principles calculations.

We present a hierarchical screening approach that is capable of dealing with such a search space, consisting of three key stages: First, we employ an arsenal of simple chemical rules that are the product of centuries of research, in order to filter out chemically implausible element compositions. This is implemented using the open-source SMACT package.1 Second, we use supervised machine learning and data mining to rapidly filter for target properties and suggest likely structures of leading candidates. Finally, we apply density functional theory (DFT) calculations in order to verify stability, structure and target properties. At each stage, the size of the search space is drastically reduced, ensuring that as computational cost increases, the number of candidate materials remains feasible.

We demonstrate the power of this approach by discovering new quaternary oxide materials for solar energy applications. SMACT is used to reduce the search space of billions to ~1 million chemically sensible compositions. A gradient boosting regression machine learning model is trained on a database2 of high-quality bandgap calculations, then used to identify ~20,000 oxide compositions that are most likely to have useful bandgaps. A recent statistics-based approach to structure prediction using a probabilistic model proposed by Hautier et al.3 is employed to suggest ~500,000 likely crystal structures, which are then fed to the AFLOW-ML4 model to predict thermodynamic stability via machine-learnt DFT total energies. All of these steps can be carried out in a matter of hours to days, using minimal computing resources. The result is a series of potential new energy materials; our methodology can be applied to materials design in a range of contexts and is an important new tool in the quest for accelerated materials discovery.

1. D. W. Davies, K. T. Butler, A. J. Jackson, A. Morris, J. M. Frost, J. M. Skelton, A. Walsh, Chem, 2016, 1, 617.
2. I. E. Castelli, F. Hüser, M. Pandey, H. Li, K. S. Thygesen, B. Seger, A. Jain, K. A. Persson, G. Ceder, K. W. Jacobsen, Adv. Energy Mater., 2015, 5, 1400915
3. G. Hautier, C. Fischer, V. Ehrlacher, A. Jain, G. Ceder, Inorg. Chem., 2011, 50, 656.
4. O. Isayev, C. Oses, C. Toher, E. Gossett, S. Curtarolo, A. Tropsha, Nat. Commun., 2017, 8, 15679.