License: ESM C (Cambrian) is licensed under Custom (Cambrian Open License Agreement) and has restrictions around commercial use and may require explicit attribution when utilized. Please refer to the license for full terms.
Proto is not affiliated with Biohub. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
evolutionaryscale/esm
Open Notebook
Open notebook
Background
ESM C (EvolutionaryScale, 2024) is a protein language model trained with the masked language modeling objective: during training, residues are hidden at random and the model learns to predict the original amino acid from the surrounding residues on both sides. For each residue it produces a contextual numerical representation (an embedding), along with per-position scores (logits) over the 20 standard amino acids. ESM C is distributed in the sameesm software package as ESM3, but does not include ESM3’s structure track or sequence-generation capability; it provides only embeddings and per-position scores. Two openly licensed model sizes are wrapped here: esmc_300m (embedding size 960, Cambrian Open License, commercial use permitted) and esmc_600m (embedding size 1152, Cambrian Non-Commercial License, research and internal use only). A larger 6B-parameter ESM C model is available only through EvolutionaryScale’s hosted Forge service and is not exposed by this wrapper.
Tools
Toolkit Notes
These apply to every ESM C tool in this toolkit (esmc-embedding).
- ESM C shares the Biohub
esmenvironment with ESM3. Both are distributed in the sameesmpackage and use a single shared on-disk environment (biohub_esm); installing either tool installs the environment for both. - The license depends on the model size.
esmc_300m(the default) is under the Cambrian Open License, with commercial use permitted subject to the naming and attribution requirement;esmc_600mis under the Cambrian Non-Commercial License and must not be used commercially. The 6B model is available only through EvolutionaryScale’s hosted Forge service and is not wrapped here. batch_sizecontrols memory usage. Lower it if you run out of GPU memory; raise it to process short sequences faster. For repeated single-batch calls, useToolInstance.persist_tool("esmc")to keep the model loaded in memory between calls; for multi-GPU or large-batch runs, preferToolPool.
Infrastructure Guides
The following guides cover how to run tools efficiently and at scale.Tool Persistence
Keep a tool’s model warm across calls instead of reloading it every invocation.
Device Management
How GPUs are allocated to tools and how to target specific devices.
Parallel Execution
Fan a batch of inputs out across multiple GPUs.

Biohub