MoML 2024 Open Notes
Our team recently attended the MoML Conference in Boston. We've curated a selection of our notes to push out into the open. Enjoy!
Forward
This piece includes a mixture of notes and opinions on several selected talks from last month’s Molecular Machine Learning (MoML) Conference put on by the Jameel Clinic at MIT. We hope it’s useful! We look forward to posting more of our notes following this week’s bio-related workshops at NeurIPS!
The State of Antibody Engineering
Dr. Karl Dane Wittrup, co-founder of antibody pioneer Adimab and professor at MIT's Koch Institute, set the day’s tone with his view of the current state of antibody engineering.
My interpretation of Dr. Wittrup’s talk was that he views monoclonal antibody (mAb) discovery as a mostly solved problem and that the impact of machine learning (ML) to this domain is exaggerated.
The argument was convincing. Wittrup noted that cost, time, and failure risk accumulate throughout a drug campaign and don’t solely emanate from the discovery phase. Specifically, he called out the fact that target identification and clinical trials are disproportionately large contributors to cost, time, and failure risk. This is objectively the case.
Using the iron triangle analogy—where you must choose two from the set of good, fast, and cheap outcomes—Wittrup broke down his view further:
If you want a good mAb quickly, you need to go to a specialized development partner (e.g., Adimab) and part with single-digit royalties.
If you want a mAb cheaply and quickly, you can go to a commoditized CRO, but your molecule may not be that good.
If you want a good mAb cheaply, you can leverage an internal development team, which may not work that quickly.
ML-driven mAb discovery is positioned to break the iron triangle. There may be a near-future where you can have a good, cheap, and fast mAb. Wittrup’s point seemed to be that this isn’t as impactful on balance as it may seem. He claimed that:
We can already create good mAbs. The quality bar set by ML-discovered mAbs will be just as good, but perhaps not better than what traditional discovery techniques already get us.
Antibody discovery is already fast. Push-button mAb discovery shaves just a few months off of a 5-10 year discovery cycle dominated by target selection and clinical translation.
Virtually free mAbs are great, but they may only help you dodge a royalty payment to a development partner. You're still on the hook for $50-100M in clinical trial costs.
The talk concluded with the notion that antibody discovery is a mature field. Wittrup noted that elite mAb discovery platforms can almost always discover viable molecules. He seemed to view ML-driven approaches as highly relevent instruments in the mAb discovery armamentarium, but not as revolutionary tools that would flip antibody engineering on its head.
Given the audience's devotion to molecular discovery, this opening session seemed to take folks by surprise. In my view, Dr. Wittrup offered a pragmatic perspective on not reinventing the wheel. I’m confident his intent was not to discourage the ML scientists in the room from focusing on molecular discovery. While I agree with Dr. Wittrup’s central thesis, I may have a few contrarian views:
As I wrote in a recent piece, I think ML-designed antibodies will break the iron triangle in the very near future and that this will disintermediate traditional antibody design services.
I’m not confident that existing service providers can gradually integrate ML. My sense is that ML infrastructure and strategy need to be in place from day one, which could allow a new wave of in silico service providers to establish themselves.
Given the nascency of traditional screening techniques for emerging biologics categories (e.g., bispecifics), I see real opportunity for ML techniques to enable differentiated design capabilities for these next-generation medicines.
As indications and targets become crowded, fast-following has become a legitimate clinical strategy. Every month matters here and ML can help bridge the gap.
ML excels at multiparameter optimization in a way that standard mAb screening doesn’t. If we’re able to leverage data on developability properties, feeding these back into the design process could reduce the likelihood of downstream failure.
This first presentation was excellent. Dr. Wittrup put into perspective the real problems that matter in drug discovery and development. I share many of his views, though I may skew slightly more bullish on the applicability and impact potential of ML-driven techniques for biologics.
Language Models for Programmable Protein Therapeutics
Pranam Chatterjee shared his perspective and insights as an ardent supporter of language models as an optimal substrate for powering the future of generative protein design. Pranam’s vision is to create tools that make it simple to manipulate proteins—the same way that tools like CRISPR have made it simple to edit the human genome.
If the opening talk served as a litmus test for the current state of protein engineering, Pranam’s presentation was a tour through what might be possible going forward with computational approaches
Pranam’s lab has been prolific, having produced an array of tools that build on top of ESM2, a pre-trained large language model (LLM) that creates useful embeddings of protein sequences to power downstream tasks.
One application area involved his lab’s work using protein language models to design peptides that can bind target proteins intracellularly. Specifically, Pranam spoke about using this approach to rapidly generate ubiquibodies (uAbs)—peptides that bind target proteins to enable ubiquitination and subsequent degradation via the proteasome.
We’ve written previously about target protein degradation (TPD) and have made an investment in a company leveraging small molecule molecular glues—Monte Rosa Therapeutics (GLUE). Hence, we are enthusiastic of TPD and care deeply about new approaches to this space.
Expanding the scope of protein targets amenable to TPD is the field’s central charge. One major drawback with some degraders (e.g., PROTACs) is that the target needs to have a druggable binding pocket to design a warhead against. Proteins that are smooth, intrinsically disordered, or are otherwise lacking a canonical binding site require special treatment.
Pranam ran through a host of his lab’s methods that are relevant for building motif-specific uAbs including PepPrCLIP, PepMLM, MoPPIt, and SaLT&PepPr. Using a combination of these tools, Chatterjee gave the case for uAbs and their closely related cousin, duAbs, which stabilize target proteins.
The hallmark argument is that peptide-based degraders and stabilizers promise a more modular solution than small molecules. Moreover, he argues that uAbs and duAbs can be generated from sequence data alone, ostensibly obviating the need to acquire a co-crystal structure or design against a specific, ligandable binding pocket. Buttressing this claim, Chatterjee walked through some experimental data wherein his lab showed in vitro, uAb-mediated degradation of historically recalcitrant protein targets such as beta-catenin.
I’m very excited by this approach and I was glad to see Pranam highlight several areas where these tools have yet to improve.
Specifically, Chatterjee spoke to a model called PTM-Mamba, which injects signal from post-translational modifications (PTMs) into the design process. Forming a ternary complex with the body’s degradation machinery is a sensitive process. PTMs can influence complexation and may be relevant to designing better uAbs or duAbs. While it’s still early, this seems like a practical direction to go.
Peptides typically have very poor in vivo PK, which is why they’re difficult to turn into drugs. We’ve written about this as a motivator behind the rise of non-canonical amino acids (ncAAs), which can increase plasma stability of peptides by protecting against proteolysis. ncAAs also afford medicinal chemists an exquisite level of conformational control over peptides, so I would be interesting to see ncAAs more involved in computational workflows.
Lead optimization for degraders of all types—including uAbs—can be fickle. The formation of a ternary complex is a necessary, but not sufficient condition for an optimal degrader molecule. Ideally, the drug is able to stabilize a conformation that exposes a lysine residue to the degradation machinery. The closer the lysine, the faster the ubiquitination and the more potent the degrader. Whether or not proximal lysine exposure can be learned through data scaling is a question I don’t know the answer to yet, but would be interested to find out.
Pranam’s talk was a tour de force through the many different tools that can be built atop powerful, pre-trained protein sequence models like ESM2. I’m eager to see how much better these frameworks can get with improved underlying embeddings, as contained in newer protein language models (pLMs).
Panel Discussion: Sequence & Structure
The day’s lone panel centered on the perennial debate between protein sequence versus protein structure as the optimal representation for generative models. While intellectually stimulating, I’m not in the business of taking a side here. Industry will use whatever works. In the grand scheme, there’s not a lot of bio-data to go around, so I’m sure folks will continue to use all the data they can get their hands on—regardless of what format it’s in.
I will try to present both sides to this argument based on some of the points of the speakers during this panel.
Structure Generative Modeling
Structure generative models are trained on 3D structures from experimental tools like x-ray crystallography. These models embed structural representations of molecules using geometric coordinate systems. They reason in and generate 3D structures.
Though structural data is information-rich, there’s very limited amount of it in the public domain. The largest, open database of 3D structures, the Protein Data Bank (PDB) contains on the order of 10^5 structures. Structural data is more expensive and challenging to generate.
These models often seek to map directly from 3D structure (which involves continuous embeddings) to a biological or chemical function of interest, yielding a relatively smooth optimization surface. That is, small changes in the input space correspond to small changes in the output (function) space.
An example of a ubiquitous structure generative model is AlphaFold3, which I’ve written extensively about here.
Sequence Generative Modeling
Sequence generative models are trained on 1D sequences of DNA and/or proteins obtained via next-generation sequencing (NGS). These sequences are often annotated with functional data, enabling a direct link between between sequence and function.
These models are exposed to structure implicitly through sequence data alone. Trained on enormous amounts of protein sequences, they learn to capture structural relationships by detecting co-evolutionary patterns, positional correlations, and conserved motifs that reflect physical constraints in folded proteins.
Sequence space can sometimes prove challenging to optimize in. This is because it’s discrete—small changes in sequence can cause massive changes in structure, and thus, function.
Offsetting this is the fact that sequence data is both relatively abundant and cheap to generate. UniProt contains on the order of 10^8 sequences, for example. Modern protein expression systems, single-cell partitioning, and NGS have made it facile to collect more annotated sequence data.
My favorite bipartisan takeaway involved a recasting of this debate from sequence versus structure to structure implicit versus explicit. The idea is that all models are exposed to structure. Structure explicit models are directly trained on structure. They reason in and generate structures. Meanwhile, structure implicit models are trained on sequence, though structure is implicitly captured in the embeddings, whether that’s through learned co-evolutionary statistics or otherwise.
One interesting observation revolved around biases in structural data acquisition methods. James Fraser and Mark Murcko have a beautiful paper about this entitled Structure is Beauty, But Not Always Truth. The essence of this argument is that structural representations are biased towards low-free energy conformations and that these snapshots occur in non-physiologic environments (read: not in a cell). Meanwhile, sequence data is not assumptive at all. It may excel in areas where traditional structure-based paradigms struggle, such as targeting intrinsically disordered proteins.
Meanwhile, another salient point focused on the therapeutic relevance of de novo protein-protein interactions (PPIs)—that is, PPIs that aren’t found in evolutionary databases. For example, therapeutic antibody-antigen interactions are new to Nature. While we can learn general features about antibody-antigen interactions from databases like OAS, we can’t rely on co-evolutionary couplings in the same way we can for intra-protein interactions or for commonly occurring complexes. The debate over whether co-evolutionary information constitutes a learned form of biophysics rages on. Explicit structures were discussed as a more viable way of capturing biophysical phenomenon and assisting with optimizing de novo binders.
Like I said at the outset—I’m still learning to appreciate both sides of this debate. I think it’s incredibly thought-provoking to wonder where each path ends. I’m paying special attention to co-generation models that make use of both forms of representation. There are successful businesses that’ve built their foundations on either approach. If you have a strong opinion—feel free to get in touch with me.
Implicitly Guided Design with PropEn: Match Your Data to Follow the Gradient
On behalf of Prescient Design and her collaborators at NYU, Nataša Tagasovska gave an excellent talk on a new ML framework—PROPerty ENhancer (PropEn). This method seems especially useful for optimizing desirable properties in scenarios lacking extensive training data, such as the life sciences. Following in silico experiments, the team validated PropEn with wet lab experiments of optimized antibody binders.
Generative algorithms seek to compress the iterative design-make-test-analyze (DMTA) cycles that underly most optimization processes. Many generative models are composed of both a generative component and a discriminative component. The former proposes new designs and the latter evaluates those designs.
An accurate discriminator is like a compass. Instead of maximizing the binding affinity of an antibody, let’s say your goal is to find the northernmost point on a tract of land. Each step is a generative process. The compass steers that generative process along an efficient path towards your goal. Without a discriminator, or compass, you’d be left to wander.
Discriminators require sufficient training data to accurately predict properties. With enough labeled datapoints, discriminators can approximate even complex mappings between the input (design) space and the output (property) space. Unfortunately, labeled data is scarce and expensive in the life sciences.
Let’s imagine that I’m training a generative antibody design model using structural representations of proteins. My input space might include 3D structures of antibody-antigen complexes obtained via x-ray crystallography. Perhaps the main property I am optimizing is binding affinity. Mapping my input space to my output space requires me to pair each crystal structure with an experimentally determined binding affinity measurement obtained via surface plasmon resonance (SPR), for example. This full process can cost thousands of dollars per (x,y) pair—roughly five orders of magnitude more expensive per datapoint compared to Internet images.
PropEn proposes the provocative question: What if there were a way to guide a generative process without training a discriminator? What new advantages and disadvantages would such a method have in low-data regimes?
The authors refer to their approach as implicit guidance since it doesn’t make use of an explicit discriminator to guide property optimization. Instead, PropEn cleverly groups datapoints together in a way that (a) grows the training dataset and (b) “inherently [embeds] the direction of property enhancement”. This may not be entirely intuitive at first glance, so I’ll break it down step-by-step.
Let’s start with the datapoint grouping procedure. The aim of this step is to find datapoints that are close together in input space (x). We first define a parameter (Δx) that quantifies how close together points must be to grouped. We next define a second parameter (Δy) that quantifies how large a property improvement is required between points x1 and x2 for them to be matched.
Grouping datapoints is important because it expands the training set exponentially. Consider that every pairwise relationship between the original datapoints is now a new datapoint. Let’s imagine I have four original datapoints:
(x1, 1.0), (x2, 1.5), (x3, 2.0), (x4, 2.5)
Assuming my threshold values allow for all possible matches, I can create six (6) pairwise relationships between these points:
(x1,x2), (x1,x3), (x1,x4), (x2,x3), (x2,x4), (x3,x4)
The maximum number of pairwise relationships I can create from n points can be represented as n(n-1)/2.
For example, with n=4, this is: 4(4-1)/2 = 4(3)/2 = 12/2 = 6
This new collection of datapoints (pairwise relationships) scales quadratically with n.
Why does having this larger dataset matter? How can we use it to implicitly guide an optimization process? Let’s imagine a new scenario where the datapoints are portraits where some are clearly better than others. After grouping, we have:
Portrait 1 and Portrait 2 are similar (matched pair), but 2 has much better lighting (better y-value).
By studying these pairs, we can teach the model that when it sees lighting like Portrait 1’s, it should adjust the lighting to look more like Portrait 2.
In this way, the improvement direction is implicitly captured in the fact that Portrait 2 is similar (but better than) Portrait 1.
There’s no need for an external discriminator (e.g., an art critic) to tell you if the change you made is good.
Once the new matched dataset has been built, the authors train a deep learning model to be able to make improvement suggestions given a point in input space. They find that this approach learns a rough approximation of the property gradient.
Normally, one would need to train an explicit discriminative model capable of predicting the property value (y) given an input (x). The gradient of this function is a vector (direction) that shows how to alter x to improve y the fastest. Using matched pairs, implicit guidance can improve properties without the need to calculate the gradient.
Let’s revisit our compass example from earlier, but this time in three dimensions. This time, there’s a hill you’re trying to summit. You’ve been blessed with a magical compass that points towards the top of the hill. Using an explicit approach, you’d follow your magical compass (gradient) up to the peak.
The implicit approach is different. You’re less resourced and don’t have a magic compass. Instead, you’re given a collection of photos from previous, successful climbers. Using pairs of photos, you think to yourself, “From this spot, they moved to that spot and gained elevation.” You can repeat this process using chains of photo-pairs until you reach the summit—this is implicit guidance.
I really like PropEn because it’s so relevant to the issue of data paucity in the life sciences. Moreover, the few-shot methodology is very practical—one where you can use wet lab-generated data to quickly converge on an optimal design. My understanding is that PropEn works with both continuous and discrete data, making it compatible with both structure and sequence generative models. I’m eager to see this method continue to improve, including becoming viable in a multi-objective optimization setting.
Thanks for reading some of our thoughts on the presentations at MoML. We hope to meet you next year! Anyone working in this space is encouraged to reach out to me as we’d love to chat.
All of us are currently at NeurIPS taking notes—so expect more from us soon!
Great summary! Apart from James and Mark's caveats about experimental structures being biased toward low-energy conformations, there's also the basic fact that every "experimental" structure is also a model and that even the PDB contains models with error - ill-defined or missing densities, extra (or unnecessary) water molecules, ligands with strain, etc. Also, out of the 200K PDB structures, only about 2oK have co-crystallized ligands. That's not a very small number, but I think it's a major reason why AF, while trained on the PDB and otherwise working well, works less well on predicting small molecule interactions which are still the bottleneck for structure-based design.