
For more than 100 years, scientists have used a method called Crystallography to determine the atomic structure of materials. The method works by shining an X-ray beam through a material sample and observing the pattern it produces. From this pattern – called a diffraction pattern – it is theoretically possible to calculate the exact arrangement of atoms in the sample. The challenge, however, is that this technique only works well when researchers have large, pure crystals. When they have to settle for a powder of minuscule pieces — called nanocrystals — the method only hints at the unseen structure.
We have created a machine learning algorithm that can observe the pattern produced by nanocrystals to infer the material’s atomic structure, as described in a new study published in Nature Materials. In many cases, the algorithm achieves near-perfect reconstruction of the atomic-scale structure from the highly degraded diffraction information — a feat unimaginable just a couple of years ago.
We have created a machine learning algorithm that can observe the pattern produced by nanocrystals to infer the material’s atomic structure, as described in a new study published in Nature Materials. In many cases, the algorithm achieves near-perfect reconstruction of the atomic-scale structure from the highly degraded diffraction information — a feat unimaginable just a couple of years ago.

We trained a generative AI model on 40,000 known atomic structures to develop a system that is able to make sense of these inferior X-ray patterns. The machine learning technique, called diffusion generative modeling, emerged from statistical physics and recently gained notoriety for enabling AI-generated art programs like Midjourney and Sora.
To apply the technique to crystallography, we began with a dataset of 40,000 crystal structures and jumbled the atomic positions until they were indistinguishable from random placement. Then, we trained a deep neural network to connect these almost randomly placed atoms with their associated X-ray diffraction patterns. The net used these observations to reconstruct the crystal. Finally, we put the AI-generated crystals through a procedure called Rietveld refinement, which essentially “jiggles” crystals into the closest optimal state, based on the diffraction pattern.
Although early versions of this algorithm struggled, it eventually learned to reconstruct crystals far more effectively than we had expected. The algorithm was able to determine the atomic structure from nanometer-sized crystals of various shapes, including samples that had proven too difficult for previous experiments to characterize.
To apply the technique to crystallography, we began with a dataset of 40,000 crystal structures and jumbled the atomic positions until they were indistinguishable from random placement. Then, we trained a deep neural network to connect these almost randomly placed atoms with their associated X-ray diffraction patterns. The net used these observations to reconstruct the crystal. Finally, we put the AI-generated crystals through a procedure called Rietveld refinement, which essentially “jiggles” crystals into the closest optimal state, based on the diffraction pattern.
Although early versions of this algorithm struggled, it eventually learned to reconstruct crystals far more effectively than we had expected. The algorithm was able to determine the atomic structure from nanometer-sized crystals of various shapes, including samples that had proven too difficult for previous experiments to characterize.
We believe that this discovery is an example of more things to come. Many people think that AI cannot really discover new things -- that it just makes incremental discoveries. But this research is an example of how, with relatively little background knowledge in physics or geometry, AI was able to learn to solve a puzzle that has baffled human researchers for a century. This is a sign of things to come for many other fields facing long-standing challenges.
This century-old powder crystallography puzzle is particularly meaningful to Hod Lipson, who is the grandson of Henry Lipson CBE FRS (1910–1991) who pioneered early computational crystallography methods. In the 1930s, Henry Lipson worked with Sir Lawrence Bragg and other contemporaries to develop some of the first computational techniques that were broadly used to solve the first complex molecules, such as penicillin, leading to the 1964 Nobel prize in Chemistry. However, a solution to the powder crystallography problem eluded him for decades.
Technical ABSTRACT
A major challenge in materials science is the determination of the structure of nanometer sized objects. Here we present
a novel approach that uses a generative machine learning model based on diffusion processes that is trained on 45,229
known structures. The model factors both the measured diffraction pattern as well as relevant statistical priors on the unit
cell of atomic cluster structures. Conditioned only on the chemical formula and the information-scarce finite-size broadened
powder diffraction pattern, we find that our model, PXRDNET, can successfully solve simulated nanocrystals as small as 10 Å
across 200 materials of varying symmetry and complexity, including structures from all seven crystal systems. We show that
our model can successfully and verifiably determine structural candidates four out of five times, with average error among
these candidates being only 7% (as measured by post-Rietveld refinement R-factor). Furthermore, PXRDNET is capable of
solving structures from noisy diffraction patterns gathered in real-world experiments. We suggest that data driven approaches,
bootstrapped from theoretical simulation, will ultimately provide a path towards determining the structure of previously unsolved
nano-materials.
a novel approach that uses a generative machine learning model based on diffusion processes that is trained on 45,229
known structures. The model factors both the measured diffraction pattern as well as relevant statistical priors on the unit
cell of atomic cluster structures. Conditioned only on the chemical formula and the information-scarce finite-size broadened
powder diffraction pattern, we find that our model, PXRDNET, can successfully solve simulated nanocrystals as small as 10 Å
across 200 materials of varying symmetry and complexity, including structures from all seven crystal systems. We show that
our model can successfully and verifiably determine structural candidates four out of five times, with average error among
these candidates being only 7% (as measured by post-Rietveld refinement R-factor). Furthermore, PXRDNET is capable of
solving structures from noisy diffraction patterns gathered in real-world experiments. We suggest that data driven approaches,
bootstrapped from theoretical simulation, will ultimately provide a path towards determining the structure of previously unsolved
nano-materials.
Selected Press |
TBD
|
Project participants |
Gabe Guo, Tristan Luca Saidi, Maxwell W. Terban, Michele Valsecchi, Simon J. L. Billinge, and Hod Lipson
|
Related Publications |
Gabe Guo, Tristan Luca Saidi, Maxwell W. Terban, Michele Valsecchi, Simon J. L. Billinge, and Hod Lipson (2025) "Ab Initio Structure Solutions from Nanocrystalline Powder Diffraction Data via Diffusion Models", Nature Materials
See also earlier system: using a different approach: Guo, Gabe, Judah Goldfeder, Ling Lan, Aniv Ray, Albert Hanming Yang, Boyuan Chen, Simon JL Billinge, and Hod Lipson. "Towards end-to-end structure determination from x-ray diffraction data using deep learning." npj Computational Materials 10, no. 1 (2024): 209. |