About Proto

A design framework for generative biology

An infrastructure layer to compose AI models, describe biological design goals, and efficiently search the space of DNA, RNA, and protein sequences that best meet custom criteria.

Proto

The era of generative biology

With advances in AI, biology is entering an era where we no longer have to resort to random guessing or exhaustive search to create new functional biological systems. We can now computationally generate and score candidates using models that predict which sequences are most likely to possess desired properties — such as binding affinity, structural stability, or any property that can be measured through a functional assay. Increasingly, methods and examples show that combining the generative and predictive power of these tools can better enable the design of complex biological systems.

Unfortunately, composing these tools together continues to require an extensive level of computational familiarity and often consumes a considerable amount of development time. The current ecosystem for biological AI is very siloed. Models are developed in isolated environments, often with strict dependency requirements that make them incompatible with one another, making it difficult to run these tools from the same codebase or environment. Composing them in iterative design cycles often requires extensive infrastructure investment to share resources adequately and run everything efficiently.

What Proto is

This is the motivation behind Proto. Proto is a design framework and infrastructure layer for generative biology. It provides a unified framework to compose AI models, describe biological design goals, and efficiently search the design space for DNA, RNA, or protein sequences that best meet custom scoring criteria.

The core of Proto is the ability to efficiently compose the wide variety of models and tools available for biological analysis and design. To enable this for everyone, we built proto-tools, an open-source Python package that provides extensive infrastructure for deploying models on local compute. It abstracts away the computational mechanics — device and dependency management, parallelization — letting researchers focus on the semantics of their inputs and outputs.

On top of the tool layer, Proto provides proto-language, a high-level design specification framework for composing the capabilities of various biological AI models to empower complex, iterative design campaigns.

Four primitives

proto-language organizes biological design around four fundamental primitives. Rather than manually stitching disparate tools together, you write a Proto program that moves through each one.

01

Sequences

Define the target biological system and its component sequences.

02

Generators

Specify how new candidate sequences should be generated.

03

Constraints

Apply constraints on the properties that candidates must satisfy.

04

Optimizers

Execute an optimization process to search the solution space.

From programs to functional sequences

Our team used the proto-language framework to design two fully synthetic systems: introns with cell-line-selective splicing behavior and matched promoter–repressor pairs. In the splicing program, Proto generated candidate intron sequences and optimized them against predictive models of splice-site usage to design different splicing outcomes across human cell lines. In the promoter–repressor program, Proto combined DNA sequence generation with structural and binding constraints to nominate pairs expected to be active, specific, and orthogonal.

Together, these examples show how complex design goals can be expressed as modular programs that generate functional sequences.

Grounded in the lab

Like all generative biology workflows, Proto is most powerful when paired with experimental testing and expert interpretation. Computational models are still imperfect, and their strengths and limitations vary across design problems. We see Proto as a way to make those models more useful in practice: a common framework for composing existing tools, learning from experimental outcomes, and improving design programs as better models become available.

In this way, Proto offers a starting point for designing around more abstract biological functions while keeping those designs grounded in concrete sequences that can be built, tested, and refined. Its aim is to make biological design more expressive, modular, and reusable as the field continues to improve.

FAQ

Command Palette

Search for a command to run...