Blog

Machine-learning-accelerated simulations to enable automatic surface reconstruction | Nature Computational Science

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Nature Computational Science volume  3, pages 1034–1044 (2023 )Cite this article Bagasse Paper Machines

Machine-learning-accelerated simulations to enable automatic surface reconstruction | Nature Computational Science

A preprint version of the article is available at arXiv.

Understanding material surfaces and interfaces is vital in applications such as catalysis or electronics. By combining energies from electronic structure with statistical mechanics, ab initio simulations can, in principle, predict the structure of material surfaces as a function of thermodynamic variables. However, accurate energy simulations are prohibitive when coupled to the vast phase space that must be statistically sampled. Here we present a bi-faceted computational loop to predict surface phase diagrams of multicomponent materials that accelerates both the energy scoring and statistical sampling methods. Fast, scalable and data-efficient machine learning interatomic potentials are trained on high-throughput density-functional-theory calculations through closed-loop active learning. Markov chain Monte Carlo sampling in the semigrand canonical ensemble is enabled by using virtual surface sites. The predicted surfaces for GaN(0001), Si(111) and SrTiO3(001) are in agreement with past work and indicate that the proposed strategy can model complex material surfaces and discover previously unreported surface terminations.

This is a preview of subscription content, access via your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

Receive 12 digital issues and online access to articles

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

The trained models, DFT data and Jupyter notebooks used for data analysis are available on Zenodo at https://doi.org/10.5281/zenodo.7758174 (ref. 72). Source data are provided with this paper.

The VSSR-MC algorithm reported in this work is available on GitHub at https://github.com/learningmatter-mit/surface-sampling. The version of code used in this work is available on Zenodo at https://doi.org/10.5281/zenodo.10086398 (ref. 73).

Shi, R., Waterhouse, G. I. & Zhang, T. Recent progress in photocatalytic CO2 reduction over perovskite oxides. Solar RRL 1, 1700126 (2017).

Sumaria, V., Nguyen, L., Tao, F. F. & Sautet, P. Atomic-scale mechanism of platinum catalyst restructuring under a pressure of reactant gas. J. Am. Chem. Soc. 145, 392–401 (2023).

Fabbri, E. et al. Dynamic surface self-reconstruction is the key of highly active perovskite nano-electrocatalysts for water splitting. Nat. Mater. 16, 925–931 (2017).

Zhang, Z., Wei, Z., Sautet, P. & Alexandrova, A. N. Hydrogen-induced restructuring of a Cu(100) electrode in electroreduction conditions. J. Am. Chem. Soc. 144, 19284–19293 (2022).

Sha, Z., Shen, Z., Cali, E., Kilner, J. A. & Skinner, S. J. Understanding surface chemical processes in perovskite oxide electrodes. J. Mater. Chem. 11, 5645–5659 (2023).

Jung, S.-K. et al. Understanding the degradation mechanisms of LiNi0.5Co0.2Mn0.3O2 cathode material in lithium ion batteries. Adv. Energy Mater. 4, 1300787 (2014).

Han, B. et al. From coating to dopant: how the transition metal composition affects alumina coatings on Ni-rich cathodes. ACS Appl. Mater. Interfaces 9, 41291–41302 (2017).

Xu, C. et al. Bulk fatigue induced by surface reconstruction in layered Ni-rich cathodes for Li-ion batteries. Nat. Mater. 20, 84–92 (2021).

Hirata, A., Saiki, K., Koma, A. & Ando, A. Electronic structure of a SrO-terminated SrTiO3(100) surface. Surf. Sci. 319, 267–271 (1994).

Castell, M. R. Scanning tunneling microscopy of reconstructions on the SrTiO3(001) surface. Surf. Sci. 505, 1–13 (2002).

Erdman, N. et al. The structure and chemistry of the TiO2-rich surface of SrTiO3(001). Nature 419, 55–58 (2002).

Heifets, E., Piskunov, S., Kotomin, E. A., Zhukovskii, Y. F. & Ellis, D. E. Electronic structure and thermodynamic stability of double-layered SrTiO3(001) surfaces: ab initio simulations. Phys. Rev. B 75, 115417 (2007).

Li, H., Jiao, Y., Davey, K. & Qiao, S.-Z. Data-driven machine learning for understanding surface structures of heterogeneous catalysts. Angew. Chem. Int. Ed. 135, e202216383 (2023).

Merte, L. R. et al. Structure of an ultrathin oxide on Pt3Sn(111) solved by machine learning enhanced global optimization. Angew. Chem. Int. Ed. 61, e202204244 (2022).

Foiles, S. M., Baskes, M. I. & Daw, M. S. Embedded-atom-method functions for the fcc metals Cu, Ag, Au, Ni, Pd, Pt, and their alloys. Phys. Rev. B 33, 7983–7991 (1986).

Nord, J., Albe, K., Erhart, P. & Nordlund, K. Modelling of compound semiconductors: analytical bond-order potential for gallium, nitrogen and gallium nitride. J. Phys. Condensed Matter 15, 5649 (2003).

Kolpak, A. M., Li, D., Shao, R., Rappe, A. M. & Bonnell, D. A. Evolution of the structure and thermodynamic stability of the BaTiO3(001) surface. Phys. Rev. Lett. 101, 036102 (2008).

Wexler, R. B., Qiu, T. & Rappe, A. M. Automatic prediction of surface phase diagrams using ab initio grand canonical Monte Carlo. J. Phys. Chem. C 123, 2321–2328 (2019).

Zhou, X.-F., Oganov, A. R., Shao, X., Zhu, Q. & Wang, H.-T. Unexpected reconstruction of the α-boron (111) surface. Phys. Rev. Lett. 113, 176101 (2014).

Timmermann, J. et al. IrO2 surface complexions identified through machine learning and surface investigations. Phys. Rev. Lett. 125, 206101 (2020).

Wales, D. J. & Doye, J. P. K. Global optimization by basin-hopping and the lowest energy structures of Lennard–Jones clusters containing up to 110 atoms. J. Phys. Chem. A 101, 5111–5116 (1997).

Panosetti, C., Krautgasser, K., Palagin, D., Reuter, K. & Maurer, R. J. Global materials structure search with chemically motivated coordinates. Nano Lett. 15, 8044–8048 (2015).

Obersteiner, V., Scherbela, M., Hörmann, L., Wegner, D. & Hofmann, O. T. Structure prediction for surface-induced phases of organic monolayers: overcoming the combinatorial bottleneck. Nano Lett. 17, 4453–4460 (2017).

Egger, A. T. et al. Charge transfer into organic thin films: a deeper insight through machine-learning-assisted structure search. Adv. Sci. 7, 2000992 (2020).

Bauer, M. N., Probert, M. I. J. & Panosetti, C. Systematic comparison of genetic algorithm and basin hopping approaches to the global optimization of Si(111) surface reconstructions. J. Phys. Chem. A 126, 3043–3056 (2022).

Wang, Q., Oganov, A. R., Zhu, Q. & Zhou, X.-F. New reconstructions of the (110) surface of rutile TiO2 predicted by an evolutionary method. Phys. Rev. Lett. 113, 266101 (2014).

Schusteritsch, G. & Pickard, C. J. Predicting interface structures: from SrTiO3 to graphene. Phys. Rev. B 90, 035424 (2014).

Meldgaard, S. A., Mortensen, H. L., Jørgensen, M. S. & Hammer, B. Structure prediction of surface reconstructions by deep reinforcement learning. J. Phys. Condensed Matter 32, 404005 (2020).

Hess, F. & Yildiz, B. Polar or not polar? The interplay between reconstruction, Sr enrichment, and reduction at the La0.75Sr0.25MnO3(001) surface. Phys. Rev. Mater. 4, 015801 (2020).

Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).

Axelrod, S. et al. Learning matter: materials design with machine learning and atomistic simulations. Acc. Mater. Res. 3, 343–357 (2022).

Bisbo, M. K. & Hammer, B. Efficient global structure optimization with a machine-learned surrogate model. Phys. Rev. Lett. 124, 086102 (2020).

Bisbo, M. K. & Hammer, B. Global optimization of atomic structure enhanced by machine learning. Phys. Rev. B 105, 245404 (2022).

Timmermann, J. et al. Data-efficient iterative training of Gaussian approximation potentials: application to surface structure determination of rutile IrO2 and RuO2. J. Chem. Phys. 155, 244107 (2021).

Rønne, N. et al. Atomistic structure search using local surrogate model. J. Chem. Phys. 157, 174115 (2022).

Han, Y. et al. Prediction of surface reconstructions using MAGUS. J. Chem. Phys. 158, 174109 (2023).

Xu, J., Xie, W., Han, Y. & Hu, P. Atomistic insights into the oxidation of flat and stepped platinum surfaces using large-scale machine learning potential-based grand-canonical Monte Carlo. ACS Catal. 12, 14812–14824 (2022).

Bernardin, F. E. & Rutledge, G. C. Semi-grand canonical Monte Carlo (SGMC) simulations to interpret experimental data on processed polymer melts and glasses. Macromolecules 40, 4691–4702 (2007).

Damewood, J., Schwalbe-Koda, D. & Gómez-Bombarelli, R. Sampling lattices in semi-grand canonical ensemble with autoregressive machine learning. npj Comput. Mater. 8, 61 (2022).

Carrete, J., Montes-Campos, H., Wanzenböck, R., Heid, E. & Madsen, G. K. H. Deep ensembles vs committees for uncertainty estimation in neural-network force fields: comparison and application to active learning. J. Chem. Phys. 158, 204801 (2023).

Tan, A. R., Urata, S., Goldman, S., Dietschreit, J. C. B. & Gómez-Bombarelli, R. Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles. Preprint at https://arxiv.org/abs/2305.01754 (2023).

Schwalbe-Koda, D., Tan, A. R. & Gómez-Bombarelli, R. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks. Nat. Commun. 12, 5104 (2021).

Fu, X. et al. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Transactions on Machine Learning Research https://openreview.net/forum?id=A8pqQipwkt (2023).

Damewood, J. et al. Representations of materials for machine learning. Annu. Rev. Mater. Res. 53, 399–426 (2023).

Stephenson, P. C. L., Radny, M. W. & Smith, P. V. A modified Stillinger–Weber potential for modelling silicon surfaces. Surf. Sci. 366, 177–184 (1996).

Northrup, J. E., Neugebauer, J., Feenstra, R. M. & Smith, A. R. Structure of GaN(0001): the laterally contracted Ga bilayer model. Phys. Rev. B 61, 9932–9935 (2000).

Štich, I., Payne, M. C., King-Smith, R. D., Lin, J.-S. & Clarke, L. J. Ab initio total-energy calculations for extremely large systems: application to the Takayanagi reconstruction of Si(111). Phys. Rev. Lett. 68, 1351–1354 (1992).

Smeu, M., Guo, H., Ji, W. & Wolkow, R. A. Electronic properties of Si(111)-7×7 and related reconstructions: density functional theory calculations. Phys. Rev. B 85, 195315 (2012).

Herger, R. et al. Surface of strontium titanate. Phys. Rev. Lett. 98, 076102 (2007).

Hong, C. et al. Anomalous intense coherent secondary photoemission from a perovskite oxide. Nature 617, 493–498 (2023).

Szot, K. & Speier, W. Surfaces of reduced and oxidized SrTiO3 from atomic force microscopy. Phys. Rev. B 60, 5909–5926 (1999).

Kubo, T. & Nozoye, H. Surface structure of SrTiO3(100). Surf. Sci. 542, 177–191 (2003).

Winter, G. & Gómez-Bombarelli, R. Simulations with machine learning potentials identify the ion conduction mechanism mediating non-Arrhenius behavior in LGPS. J. Phys. Energy 5, 024004 (2023).

Millan, R., Bello-Jurado, E., Moliner, M., Boronat, M. & Gomez-Bombarelli, R. Effect of framework composition and NH3 on the diffusion of Cu+ in Cu-CHA catalysts predicted by machine-learning accelerated molecular dynamics. ACS Cent. Sci. 9, 2044–2056 (2023).

Thompson, A. P. et al. LAMMPS—a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comput. Phys. Commun. 271, 108171 (2022).

Larsen, A. H. et al. The atomic simulation environment—a Python library for working with atoms. J. Phys. Condensed Matter 29, 273002 (2017).

Boes, J. R., Mamun, O., Winther, K. & Bligaard, T. Graph theory approach to high-throughput surface adsorption structure generation. J. Phys. Chem. A 123, 2281–2285 (2019).

Ong, S. P. et al. Python Materials Genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).

Momma, K. & Izumi, F. VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data. J. Appl. Crystallogr. 44, 1272–1276 (2011).

Jain, A. et al. The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).

Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Proc. 38th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 139 (eds Meila, M. & Zhang, T.) 9377–9388 (PMLR, 2021).

Martinez-Cantin, R., Tee, K. & McCourt, M. Practical Bayesian optimization in the presence of outliers. In Proc. Twenty-First International Conference on Artificial Intelligence and Statistics, Proc. Machine Learning Research Vol. 84 (eds Storkey, A. & Perez-Cruz, F.) 1722–1731 (PMLR, 2018).

Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions. Preprint at https://arxiv.org/abs/1710.05941 (2017).

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations, ICLR 2015 (eds Bengio, Y. & LeCun, Y.) (2015).

Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. Machine Learning for Molecules Workshop, NeurIPS 2020 https://ml4molecules.github.io/papers2020/ML4Molecules_2020_paper_35.pdf (2020).

Reuter, K. & Scheffler, M. Composition, structure, and stability of RuO2(110) as a function of oxygen pressure. Phys. Rev, B 65, 035406 (2001).

Heifets, E., Ho, J. & Merinov, B. Density functional simulation of the BaZrO3(011) surface structure. Phys. Rev. B 75, 155431 (2007).

Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).

Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999).

Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

Tadmor, E. B., Elliott, R. S., Sethna, J. P., Miller, R. E. & Becker, C. A. The potential of atomistic simulations and the knowledgebase of interatomic models. JOM 63, 17 (2011).

Du, X. Data for: Machine-learning-accelerated simulations to enable automatic surface reconstruction. Zenodo https://doi.org/10.5281/zenodo.7758174 (2023).

Du, X. learningmatter-mit/surface-sampling. Zenodo https://doi.org/10.5281/zenodo.10086398 (2023).

We thank G. Winter, J. Peng, N. Frey and M. Liu for helpful discussions. We also appreciate editing by J. Peng and A. Hoffman. X.D. acknowledges support from the National Science Foundation Graduate Research Fellowship under grant no. 2141064. J.K.D. was supported by the Department of Defense through the National Defense Science and Engineering Graduate Fellowship Program. We are grateful for computation time allocated on the MIT SuperCloud cluster, the MIT Engaging cluster and the NERSC Perlmutter cluster. This material is based on work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering. Delivered to the US Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (February 2014). Notwithstanding any copyright notice, US Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the US Government may violate any copyrights that exist in this work.

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

Center for Computational Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

Xiaochen Du & James K. Damewood

Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

James K. Damewood, Jaclyn R. Lunger, Reisel Millan, Bilge Yildiz & Rafael Gómez-Bombarelli

Department of Nuclear Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

Microsystems Technology Laboratories, Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USA

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

You can also search for this author in PubMed  Google Scholar

X.D. implemented the sampling algorithm, performed surface modeling, ran DFT calculations, trained the neural networks and carried out surface stability analysis. J.K.D. assisted with sampling algorithm implementation and provided guidance with surface modeling. J.R.L. provided guidance with surface modeling and ran DFT calculations. R.M. provided guidance with neural network training and active learning. B.Y. provided guidance with the choice of surfaces and surface stability analysis. L.L. supervised the research and contributed to securing funding. R.G.-B. conceived the project, supervised the research and contributed to securing funding. All authors contributed to results discussion and paper writing.

Correspondence to Rafael Gómez-Bombarelli.

The authors declare no competing interests.

Nature Computational Science thanks Mie Andersen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The surface reconstructions are rotated in comparison with the reference structure from ref. 46 but contain the same hexagonal pattern.

a–c, Structures shown were obtained from constant-composition (canonical) VSSR-MC sampling using the SRS modified Stillinger-Weber potential45 with 3x3 (a), 5x5 (b) and 7x7 (c) unit cells. The SRS energies were obtained from the depicted structures while the DFT energies came from structures further relaxed at the DFT level. * Further relaxation using DFT resulted in the 3x3 DAS structure.

At each AL generation, an ensemble of just three NFF models was able to estimate force s.d. that correlated strongly with force error. Each individual data point represents a sampled structure. Each blue ‘X’ represents a binned average and a best-fit line is drawn through the binned averages. The binned average is calculated by dividing both the force s.d. and force MAE into equal-sized bins. The average force MAE is then plotted against the median force s.d. for each corresponding bin.

The majority of high-force structures were added in AL generations 1, 2 and 6, which correspond either to random structures or structures obtained through adversarial attack. The three VSSR-MC AL generations produced structures with low force values mostly around 50 eV Å-1 or less.

As described in the main paper, the test data is obtained from VSSR-MC runs using the sixth-generation NFF model.

a,b, Comparison of limited fixed on-lattice sites (a) and denser algorithmically-generated virtual surface sites that can overlap (b). c, Off-lattice reconstructions can be obtained following VSSR-MC discrete sampling at virtual sites and continuous relaxation of surface atoms and adsorbates. d, Amorphous reconstructions with many local minima, however, will likely be difficult for VSSR-MC to sample.

a–d, Pymatgen (a) and CatKit (b) virtual sites for GaN(0001) against the contracted Ga monolayer reconstruction, two-layer pymatgen sites for Si(111) against the 5x5 DAS reconstruction (c), and pymatgen virtual sites for SrTiO3(001) against the double-layer TiO2 reconstruction (d). The dashed lines are a guide for the eye.

a, Clustering of VSSR-MC structures in the NFF latent space visualized in the first three principal components. In the VSSR-MC with clustering AL method, the surface from each cluster with the highest force s.d. is selected for DFT evaluation. b, PCA of training data and the dominant terminations (term.) in the latent space of the sixth-generation model.

Supplementary Sections: (1) abbreviations used; and (2) surface stability analysis.

Comparison of AutoSurfRecon with existing computational methods for surface reconstruction. AutoSurfRecon automatically samples across many surface compositions and configurations while training an accurate NFF for low-cost energy prediction.

Statistical source data: Typical GaN(0001) VSSR-MC run profile.

Statistical source data: (b) force error and predicted force s.d. for the sixth-generation model; (c) latent space embedding PCA of surfaces acquired at each AL generation; (d) force and energy predictions of the model at each AL generation on the final test set.

Statistical source data: (b) predicted surface free energies for each dominant termination across Sr and O chemical potentials; (c-e) predicted surface free energies of sampled structures at Sr chemical potentials of −10, −7 and −4 eV and O chemical potential of 0 eV.

Statistical source data: force error and predicted force s.d. over six AL generations.

Statistical source data: distribution of force magnitudes over six AL generations.

Statistical source data: predictions of the sixth-generation AL model on final test data.

Statistical source data: (a) PCA of test data in the latent space of the sixth-generation model; (b) PCA of the sixth-generation training data and dominant terminations in the latent space of the sixth-generation model.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Du, X., Damewood, J.K., Lunger, J.R. et al. Machine-learning-accelerated simulations to enable automatic surface reconstruction. Nat Comput Sci 3, 1034–1044 (2023). https://doi.org/10.1038/s43588-023-00571-7

DOI: https://doi.org/10.1038/s43588-023-00571-7

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Nature Computational Science (Nat Comput Sci) ISSN 2662-8457 (online)

Machine-learning-accelerated simulations to enable automatic surface reconstruction | Nature Computational Science

Kraft Paper Machine Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.