Sam Gelman

Hi, I'm Sam! I research deep learning methods for protein design. I have a strong background in computer science, machine learning, and biology, which has allowed me to excel in this field. I am passionate about using deep learning to better understand protein structure and function, and I have developed innovative algorithms and models in this area. Let's connect!

Featured talk

Play Video

Featured preprint

Biophysics-based protein language models for protein engineering

Sam Gelman, Bryce Johnson, Chase Freschlin, Sameer D’Costa, Anthony Gitter, Philip A. Romero. bioRxiv (2024).

Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL’s ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.

Featured publication

Neural networks to learn protein sequence–function relationships from deep mutational scanning data

Sam Gelman, Sarah A. Fahlberg, Pete Heinzelman, Philip A. Romero, Anthony Gitter. Proceedings of the National Academy of Sciences (2021).

Understanding the relationship between protein sequence and function is necessary to design new and useful proteins with applications in bioenergy, medicine, and agriculture. The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We show that neural networks can learn the sequence–function mapping from large protein datasets. Neural networks are appealing for this task because they can learn complicated relationships from data, make few assumptions about the nature of the sequence–function relationship, and can learn general rules that apply across the length of the protein sequence. We demonstrate that learned models can be applied to design new proteins with properties that exceed natural sequences.