Recently, there was an experimental design project that I collaborated on with another wet-lab group. My role involved using a method akin to a Diffusion Model to engineer proteins and assessing how modifications could potentially yield higher-performance proteins. I then predicted 10–20 sequences to hand over to the wet-lab team for expression and validation. However, I’ve always found this approach dubious because predictions are just that—predictions. Sometimes, what works perfectly in silico fails when expressed in reality.
Later, my advisor suggested continuing with the approach, and if that didn’t work, we’d pivot to a different direction.
Taking the recent examples, my advisor’s group has a senior student using generative models for tasks like cell evolution and gene expression prediction. On the chemistry side (my high school classmate is at PKU’s Chemistry Institute), they’re also using generative models for small/large molecule generation and property prediction.
Honestly, I was skeptical at first. I discussed this with my high school classmate, and he mentioned that many mechanisms in biology and chemistry are still not fully understood by humans. Maybe AI, empowered by large datasets, can solve these problems? Plus, AI4S is one of the more publication-friendly AI subfields—think multimodal generation, video generation, etc.—but these tasks require significant resources, compute power, and effort.
My personal take: Generative models in biology/chemistry are still in their early stages. Perhaps in 5–10 years, we’ll see a true technological breakthrough—similar to how LLMs emerged just five years after the Transformer’s introduction.
Currently, I have an interesting project involving the use of a Transformer-based model to predict downstream pathways from single-cell data, with some custom modifications to the layers.
Transformers are indeed well-suited for biology, where long-range context matters, and the latest AlphaFold3 appears to be an adapted version of the Transformer architecture (with attention mechanisms already introduced in AlphaFold2, such as in Evoformer).
However, I’ve always felt that biology and computer science remain disconnected—often using outdated computational techniques that computer scientists have long moved past, while biologists pick them up.
Setting aside academic discussions, I think if we could develop a very powerful and commercially viable NSFW large model, it could generate a lot of revenue. We could even reverse-engineer the admin’s review model.
Brain-computer interface (BCI) is also advancing steadily. Two years ago, the paper on transformer-based brain signal processing was already quite cutting-edge. Now, there are discussions about developing large-scale brain-computer models.
"Just the other day, I was chatting with my senior about what if everyone gets brain-computer interfaces—and they install an anti-fraud center in your head? Then it’d be like“Big Brother is watching you”for real."