3 Comments

love the overview. a few misc. thoughts about the uAA and peptide tx space:

(1) lots of innovation happening in the chemical synthesis of long peptide sequences: Brad Pentelute et al have done amazing work here, see https://www.science.org/doi/10.1126/science.abb2491 for a sample of this work. there's a lot of tough engineering work that they've done to make this more effective. chemical synthesis is ultimately the way that will be most flexible here, i think, because you don't have to worry about all the issues w/ artificial tRNAs, amber suppression, etc.... particularly for peptides (and not full proteins), chemical approaches feel like the future here.

(2) particularly for macrocycles, uAAs give exquisite control over conformation, which is incredibly powerful. macrocycles are a really interesting class of peptide therapeutics, in my opinion (and lots of other people's opinions too) - enough TPSA to hit PPIs or other unconventional binding sites, but the option to still have cell permeability. i like Andre Yudin's work in this area: https://www.nature.com/articles/s41557-020-00620-y, https://onlinelibrary.wiley.com/doi/abs/10.1002/anie.202206866. also lots of work written about the "chameleonicity" of conformation-changing macrocyclic peptides, e.g. https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/chem.201905599

(3) the notes about the difficulty of molecular representation are spot-on - macrocycles exist in some liminal zone where a lot of small molecule-focused computational techniques don't scale well at all, but where the assumptions of protein-specific tools are just wrong. this is true both in ML/representation contexts and simulation/modeling contexts... even just predicting what shape a macrocyclic peptide will have is brutally hard, see Yu-Shan Lin's work here (https://pubs.acs.org/doi/full/10.1021/acs.jpcb.4c00157 and others).

Expand full comment

Fascinating science and excellent job on the post! I read that AF3 uses a mix of atomic and molecular scale tokenization. Also its a mix of transformer and diffusion models. My understanding is that transformers are limited by vocab size but diffusion isn't. In future models will we need an extensive vocabulary for each uAA and does that limit the scale we can use transformer architectures to predict uAA properties? Alternatively there are mechanistic models we could use to discover the general features across uAAs. I could imagine how this might help define the essential vocab by identifying uAAs with redundant effects on the model.

Expand full comment

This is a pretty comprehensive overview of unnatural amino acids (uAAs), kudos to you for writing it. I feel that being able to synthesize proteins with specific uAAs incorporated could be valuable for other enzyme engineering applications outside of therapeutics, including improving DNA editor enzymes (which we aim to do at BioCompute in the long run), creating enzymes that can effectively break down non-biodegradable waste and so on.

Would love to know your thoughts on where the protein synthesis ecosystem is headed as well, because the time it takes to express a plasmid in a cellular system and then purify it is a roadblock to iterating fast in enzyme related experiments.

Expand full comment