Quick example
Methods
.embed()
Generate embeddings for one or more protein sequences.
A single amino acid sequence string or a list of sequences. Use
":" to separate chains in a multimer (e.g., "CHAIN_A:CHAIN_B").Per-call configuration overrides. Only dynamic config keys can be set here — static keys raise
ValueError. See Configuration.ESM2Output (see Output below)
Output
TheESM2Output dataclass returned by .embed().
Always included
Prediction metadata with timing information. See PredictionMetadata.
Per-residue embeddings from the final transformer layer. Shape:
(batch, seq_len, embedding_dim).Per-residue chain assignment. Shape:
(batch, seq_len).Per-residue residue numbering. Shape:
(batch, seq_len).Optional
Embeddings from all intermediate transformer layers. Shape:
(batch, num_layers, seq_len, embedding_dim). Only included when include_fields contains "hidden_states" or "*".Model sizes
ESM-2 is available in 6 sizes. Set the model viaconfig={"model_name": "..."}.
| Model name | Parameters | Layers | Hidden dim | Attention heads |
|---|---|---|---|---|
esm2_t6_8M_UR50D | 8M | 6 | 320 | 20 |
esm2_t12_35M_UR50D | 35M | 12 | 480 | 20 |
esm2_t30_150M_UR50D | 150M | 30 | 640 | 12 |
esm2_t33_650M_UR50D | 650M | 33 | 1280 | 20 |
esm2_t36_3B_UR50D | 3B | 36 | 2560 | 40 |
esm2_t48_15B_UR50D | 15B | 48 | 5120 | 40 |
esm2_t33_650M_UR50D (650M parameters).
Configuration
These keys can be set viaconfig={} at initialization or options={} per call (unless marked static).
| Key | Type | Default | Static | Description |
|---|---|---|---|---|
device | str | "cuda:0" | Yes | GPU device identifier |
model_name | str | "esm2_t33_650M_UR50D" | Yes | ESM-2 model variant to use (see table above) |
glycine_linker | str | "" | No | Linker string inserted between chains for multimer tokenization |
position_ids_skip | int | 512 | No | Position ID offset between chains in multimer mode |
include_fields | list[str] | None | None | No | Which output fields to return. Use ["hidden_states"] to include all intermediate layer representations. Use ["*"] for everything |
Multimer embedding
Separate chains with":" in the sequence string:
Hidden states
To retrieve embeddings from all intermediate transformer layers, include"hidden_states" in include_fields:
