5 challenges we could solve by designing new proteins | David Baker

Recently, there was an experimental design project that I collaborated on with another wet-lab group. My role involved using a method akin to a Diffusion Model to engineer proteins and assessing how modifications could potentially yield higher-performance proteins. I then predicted 10–20 sequences to hand over to the wet-lab team for expression and validation. However, I’ve always found this approach dubious because predictions are just that—predictions. Sometimes, what works perfectly in silico fails when expressed in reality.

Later, my advisor suggested continuing with the approach, and if that didn’t work, we’d pivot to a different direction.

https://www.youtube.com/watch?v=PJLT0cAPNfs

4 Likes

Bio + AI is already quite common these days.

Taking the recent examples, my advisor’s group has a senior student using generative models for tasks like cell evolution and gene expression prediction. On the chemistry side (my high school classmate is at PKU’s Chemistry Institute), they’re also using generative models for small/large molecule generation and property prediction.

Honestly, I was skeptical at first. I discussed this with my high school classmate, and he mentioned that many mechanisms in biology and chemistry are still not fully understood by humans. Maybe AI, empowered by large datasets, can solve these problems? Plus, AI4S is one of the more publication-friendly AI subfields—think multimodal generation, video generation, etc.—but these tasks require significant resources, compute power, and effort.

My personal take: Generative models in biology/chemistry are still in their early stages. Perhaps in 5–10 years, we’ll see a true technological breakthrough—similar to how LLMs emerged just five years after the Transformer’s introduction.

The 21st century will be the century of biology! :smiling_face_with_sunglasses:

2 Likes

Currently, I have an interesting project involving the use of a Transformer-based model to predict downstream pathways from single-cell data, with some custom modifications to the layers.

Transformers are indeed well-suited for biology, where long-range context matters, and the latest AlphaFold3 appears to be an adapted version of the Transformer architecture (with attention mechanisms already introduced in AlphaFold2, such as in Evoformer).

However, I’ve always felt that biology and computer science remain disconnected—often using outdated computational techniques that computer scientists have long moved past, while biologists pick them up.

2 Likes

Hmm, the previous brother’s article in the group was about single-cell methods, and it seems there are many using flow matching in this area.

2 Likes

Setting aside academic discussions, I think if we could develop a very powerful and commercially viable NSFW large model, it could generate a lot of revenue. We could even reverse-engineer the admin’s review model.

3 Likes

My non-CS high school classmate just locally deployed Stable Diffusion to generate art :rofl:
“Serser is the first productivity tool!” :rofl:

3 Likes

My first encounter with Stable Diffusion was also for drawing Sera Sera, around 2023.

Back then, I couldn’t buy a graphics card, so I specifically purchased a compute card (P4) for image generation.

I experimented with LoRA and fine-tuning techniques, but the results weren’t great at the time.

2 Likes

Brain-computer interface (BCI) is also advancing steadily. Two years ago, the paper on transformer-based brain signal processing was already quite cutting-edge. Now, there are discussions about developing large-scale brain-computer models.

1 Like

"Just the other day, I was chatting with my senior about
what if everyone gets brain-computer interfaces—and they install an anti-fraud center in your head?
Then it’d be like “Big Brother is watching you” for real."

3 Likes

Intrusive methods can’t even reliably read data now, let alone write it :joy:

1 Like