2, Argonne National Laboratory, Chicago, Illinois, United States
3, Rutherford Appleton Laboratory, Harwell, Oxfordshire, United Kingdom
Large-scale data-mining workflows are increasingly able to predict successfully new materials that possess a targeted functionality . The success of such materials discovery approaches is nonetheless contingent upon having the right database source to mine. This presentation shows how to auto-generate tailor-made databases to search for functional materials to meet the needs of a given device application.
The talk presents the 'chemistry-aware' open-source text- and table-mining software tool, ChemDataExtractor, that can extract large volumes of material-property data from the literature, using natural language processing, optical character recognition and machine learning capabilities . Machine learning is then employed to populate any missing experimental data.
The role of this tool in accelerating materials discovery is illustrated.
 J. M. Cole K. S. Low, H. Ozoe, P. Stathi, C. Kitamura, H. Kurata, P. Rudolf, T. Kawase, “Data Mining with Molecular Design Rules Identifies New Class of Dyes for Dye-Sensitised Solar Cells” Phys. Chem. Chem. Phys. 48 (2014) 26684-90. (Communication).
 M. C. Swain, J. M. Cole, ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature,
J. Chem. Inf. Model., 2016, 56, 1894–1904