Alex Eggeman1 Ben Martineau2 Duncan Johnstone2 Robert Krakow2 Paul Midgley2

1, University of Manchester, Manchester, , United Kingdom
2, University of Cambridge, Cambridge, , United Kingdom

Data science approaches have been successfully applied to a variety of electron microscopy measurements, resulting in highly detailed analysis of composition, electronic structure, crystallography and atomic structure of complex systems. The goal in all of these approaches is to take a large set of measurements from a sample and then to utilise the redundancy in the data to recover a model, or set of significant components that accurate describe the real system.
Statistical decompositions are one widely used method for this, broadly using matrix factorisation methods to isolate those particular signals that describe the majority of the structured part of the data as efficiently as possible. This has proven a vaulable approach in the analysis of scanning diffraction data, especially in the fairly common situation where individual diffraction patterns can contain information about one or more overlapping phases in the microstructure being studied. This approach will be validated both with simulated as well as experimental data.
One major issue with any decomposition lies in determining how many significant components are needed to produce the most robust model of the measured data. Too few and real structural differences can be merged in with similar signals leading to the possibility of missing a significant change. Too many and the efficiency of the algorithm is lost and every piece of noise becomes a significant factor in the decomposition. In an attempt to make a more automatic guided decomposition the idea of data-clustering has been developed alongside decomposition. This uses the structure of the data (at least a suitably projected version of the data) to guide the correct number of components needed in the model. Under some circumstances the analysis of structure in the reduced dimensions needed can even provide physical insight into the structure and crystallography of the sample.