Discussion about this post

User's avatar
Corin Wagen's avatar

love the overview. a few misc. thoughts about the uAA and peptide tx space:

(1) lots of innovation happening in the chemical synthesis of long peptide sequences: Brad Pentelute et al have done amazing work here, see https://www.science.org/doi/10.1126/science.abb2491 for a sample of this work. there's a lot of tough engineering work that they've done to make this more effective. chemical synthesis is ultimately the way that will be most flexible here, i think, because you don't have to worry about all the issues w/ artificial tRNAs, amber suppression, etc.... particularly for peptides (and not full proteins), chemical approaches feel like the future here.

(2) particularly for macrocycles, uAAs give exquisite control over conformation, which is incredibly powerful. macrocycles are a really interesting class of peptide therapeutics, in my opinion (and lots of other people's opinions too) - enough TPSA to hit PPIs or other unconventional binding sites, but the option to still have cell permeability. i like Andre Yudin's work in this area: https://www.nature.com/articles/s41557-020-00620-y, https://onlinelibrary.wiley.com/doi/abs/10.1002/anie.202206866. also lots of work written about the "chameleonicity" of conformation-changing macrocyclic peptides, e.g. https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/chem.201905599

(3) the notes about the difficulty of molecular representation are spot-on - macrocycles exist in some liminal zone where a lot of small molecule-focused computational techniques don't scale well at all, but where the assumptions of protein-specific tools are just wrong. this is true both in ML/representation contexts and simulation/modeling contexts... even just predicting what shape a macrocyclic peptide will have is brutally hard, see Yu-Shan Lin's work here (https://pubs.acs.org/doi/full/10.1021/acs.jpcb.4c00157 and others).

Expand full comment
Matt Gruner's avatar

Fascinating science and excellent job on the post! I read that AF3 uses a mix of atomic and molecular scale tokenization. Also its a mix of transformer and diffusion models. My understanding is that transformers are limited by vocab size but diffusion isn't. In future models will we need an extensive vocabulary for each uAA and does that limit the scale we can use transformer architectures to predict uAA properties? Alternatively there are mechanistic models we could use to discover the general features across uAAs. I could imagine how this might help define the essential vocab by identifying uAAs with redundant effects on the model.

Expand full comment
1 more comment...

No posts