This AI Software Nearly Predicted Omicron’s Tricky Structure
The way predictions raced ahead of experiments on Omicron’s spike protein reflects a recent sea change in molecular biology brought about by AI. The first software capable of accurately predicting protein structures became widely available only months before Omicron appeared, thanks to competing research teams at Alphabet’s UK-based AI lab DeepMind and at the University of Washington.
Ford used both packages, but because neither was designed or validated for predicting small changes caused by mutations like those of Omicron, his results were more suggestive than definitive. Some researchers treated them with suspicion. But the fact that he could easily experiment with powerful protein prediction AI illustrates how the recent breakthroughs are already changing the ways biologists work and think.
Subramaniam says he received four or five emails from people proffering predicted Omicron spike structures while working towards his lab’s results. “Quite a few did this just for fun,” he says. Direct measurements of protein structure will remain the ultimate yardstick, Subramaniam says, but he expects AI predictions to become increasingly central to research—including on future disease outbreaks. “It’s transformative,” he says.
Because a protein’s shape determines how it behaves, knowing its structure can help all kinds of biology research, from studies of evolution to work on disease. In drug research, figuring out a protein structure can help reveal potential targets for new treatments.
Determining a protein’s structure is far from simple. They are complex molecules assembled from instructions encoded in an organism’s genome to serve as enzymes, antibodies, and much of the other machinery of life. Proteins are made from strings of molecules called amino acids that can fold into complex shapes that behave in different ways.
Deciphering a protein’s structure traditionally involved painstaking lab work. Most of the roughly 200,000 known structures were mapped using a tricky process in which proteins are formed into a crystal and bombarded with x-rays. Newer techniques like the electron microscopy used by Subramaniam can be faster, but the process is still far from easy.
In late 2020, the long-standing hope that computers could predict protein structure from an amino acid sequence suddenly became real, after decades of slow progress. DeepMind software called AlphaFold proved so accurate in a contest for protein prediction that the challenge’s cofounder John Moult, a professor at University of Maryland, declared the problem solved. “Having worked personally on this problem for so long,” Moult said, DeepMind’s achievement was “a very special moment.”
The moment was also frustrating for some scientists: DeepMind did not immediately release details of how AlphaFold worked. “You’re in this weird situation where there’s been this major advance in your field, but you can’t build on it,” David Baker, whose lab at University of Washington works on protein structure prediction, told WIRED last year. His research group used clues dropped by DeepMind to guide the design of open source software called RoseTTAFold, released in June, which was similar to but not as powerful as AlphaFold. Both are based on machine learning algorithms honed to predict protein structures by training on a collection of more than 100,000 known structures. The next month, DeepMind published details of its own work and released AlphaFold for anyone to use. Suddenly, the world had two ways to predict protein structures.
Minkyung Baek, a postdoctoral researcher in Baker’s lab who led work on RoseTTAFold, says she has been surprised by how quickly protein structure predictions have become standard in biology research. Google Scholar reports that UW’s and DeepMind’s papers on their software have together been cited by more than 1,200 academic articles in the short time since they appeared.
Although predictions haven’t proven crucial to work on Covid-19, she believes they will become increasingly important to the response to future diseases. Pandemic-quashing answers won’t spring fully formed from algorithms, but predicted structures can help scientists strategize. “A predicted structure can help you put your experimental effort into the most important problems,” Baek says. She’s now trying to get RoseTTAFold to accurately predict the structure of antibodies and invading proteins when bound together, which would make the software more useful to infectious disease projects.
Despite their impressive performance, protein predictors don’t reveal everything about a molecule. They spit out a single static structure for a protein, and don’t capture the flexes and wiggles that take place when it interacts with other molecules. The algorithms were trained on databases of known structures, which are more reflective of those easiest to map experimentally rather than the full diversity of nature. Kresten Lindorff-Larsen, a professor at the University of Copenhagen, predicts the algorithms will be used more frequently and will be useful, but says, “We also as a field need to learn better when these methods fail.”