Proto
Proto is a high-level programming language for designing DNA, RNA, and protein sequences. The required properties of a sequence are expressed as constraints; generators propose candidate sequences and optimizers search for candidates that satisfy them.Overview
Biological sequence design is typically multi-objective: a single sequence must meet several requirements at once. A designed protein may need to fold to a target structure, bind a target, express in a host organism, and remain soluble. A coding sequence may need a controlled GC content, codon usage suited to its host, no long homopolymer runs, and the absence of specified restriction sites. Proto represents each requirement as a separate constraint and optimizes against the full set rather than a single objective. A design is specified declaratively. The sequence regions to be designed are defined as segments; a generator is assigned to each region to propose candidates; constraints score how well each candidate meets a requirement; and one or more optimizers search sequence space to minimize the combined score.python
How It Works
Segments and Constructs
Segments are contiguous sequence regions to be designed; they are grouped into Constructs. A segment is initialized either from a target length or from an existing sequence.
python
Generators
A generator is assigned to a segment and proposes new sequences on each iteration. Generators range from random mutation to protein language models such as ESM2, ESM3, and ProteinMPNN.
python
Constraints
A constraint scores how well each proposal meets a requirement, from 0.0 (perfect) to 1.0 (worst). It uses either
weight for soft scoring or threshold for hard pass/fail filtering.python
Architecture
The framework has the following components, which form an optimization loop:| Component | Purpose | Examples |
|---|---|---|
| Segments | Contiguous sequence regions to design | 200bp promoter, 100aa protein domain, variable CDR loop |
| Constructs | Multi-segment containers | Promoter + CDS + terminator, multi-chain protein complex |
| Generators | Propose new proposal sequences each iteration | Random mutation, protein language models, inverse folding, autoregressive DNA/protein models |
| Constraints | Score how well sequences meet requirements (0 = perfect, 1 = worst) | Sequence composition, protein structure, RNA splicing, functional annotation, and more |
| Optimizers | Search algorithms that minimize the total constraint score | MCMC, Rejection Sampling, Beam Search, Gradient descent, Cycling |
| Programs | Multi-stage optimizer pipelines | Rejection Sampling exploration then MCMC fine-tuning |
Applications
- Protein Design
- DNA Optimization
- RNA Engineering
Proteins can be designed for predicted structural properties. ESM2 or ProteinMPNN generate proposals, which are scored by ESMFold or Boltz2 for folding confidence, by TM-score for structural similarity, and by additional quality metrics.
python
Key Features
Declarative Design
Sequences are specified by the properties they must satisfy rather than by a search procedure. Constraints define the requirements; the optimizer performs the search.
Composable Components
Generators, constraints, and optimizers combine freely. Multi-stage pipelines chain broad exploration with targeted refinement.
Integrated ML Models
Built-in support for protein language models, structure predictors, inverse-folding models, and genomic deep-learning models.
Bioinformatics Tools
Tools for structure prediction, sequence search, motif analysis, splicing prediction, and annotation are callable as constraints.
Multi-Objective Optimization
Competing requirements are balanced through weighted scoring and hard threshold filters across any number of constraints.
CPU and GPU
Lightweight generators and constraints run on CPU; structure prediction, language models, and genomic deep learning run on GPU when available.
Get Started
Installation
Install Proto on CPU or GPU, using pip or conda.
Quickstart
A step-by-step, runnable tutorial for a first design.
Core Concepts
Reference on segments, constructs, generators, constraints, optimizers, and programs.













































































