Skip to main content
Boltz-2 is a diffusion-based protein structure prediction model with built-in MSA (Multiple Sequence Alignment) support. It produces high-quality 3D structures with comprehensive confidence metrics and supports MSA caching for faster repeated predictions.

Quick example

from boileroom import Boltz2

model = Boltz2(backend="modal")
result = model.fold("MKTVRQERLKSIVRI")

print(result.atom_array)  # list of Biotite AtomArray objects
print(result.plddt)       # per-residue confidence scores
print(result.confidence)  # comprehensive confidence metrics

Methods

.fold()

Predict the 3D structure of one or more protein sequences.
result = model.fold(sequences, options=None)
sequences
str | Sequence[str]
required
A single amino acid sequence string or a list of sequences. Use ":" to separate chains in a multimer (e.g., "CHAIN_A:CHAIN_B").
options
dict | None
default:"None"
Per-call configuration overrides. Only dynamic config keys can be set here — static keys raise ValueError. See Configuration.
Returns: Boltz2Output (see Output below)

Output

The Boltz2Output dataclass returned by .fold().

Always included

metadata
PredictionMetadata
Prediction metadata with timing information. See PredictionMetadata.
atom_array
list[AtomArray] | None
List of Biotite AtomArray objects (one per diffusion sample). Always generated.

Confidence metrics

confidence
list[dict] | None
Comprehensive confidence metrics (one dict per sample). Each dict may contain keys like confidence_score, ptm, iptm, ligand_iptm, protein_iptm, complex_plddt, complex_iplddt, complex_pde, complex_ipde, pair_chains_iptm, and chains_ptm.
plddt
list[np.ndarray] | None
Per-residue pLDDT scores (one 1D array per sample).
pae
list[np.ndarray] | None
Predicted aligned error matrices (one 2D square matrix per sample).
pde
list[np.ndarray] | None
Predicted distance error matrices (one 2D square matrix per sample).

Structure representations

pdb
list[str] | None
PDB-formatted structure strings. Only generated when include_fields contains "pdb" or "*".
cif
list[str] | None
mmCIF-formatted structure strings. Only generated when include_fields contains "cif" or "*".

Configuration

These keys can be set via config={} at initialization or options={} per call (unless marked static).

Core settings

KeyTypeDefaultStaticDescription
devicestr"cuda:0"YesGPU device identifier
cache_dirstr | NoneNoneYesDirectory for model weights and resources. Defaults to {MODEL_DIR}/boltz
no_kernelsboolFalseYesDisable custom CUDA kernels (use for CPU or debugging)
seedint | NoneNoneNoRandom seed for reproducibility
overrideboolFalseNoOverride existing output files
num_workersint2NoNumber of data loading workers
include_fieldslist[str] | NoneNoneNoWhich output fields to return. Use ["pdb"] or ["cif"] to include structure strings. Use ["*"] for everything

Diffusion settings

KeyTypeDefaultStaticDescription
recycling_stepsint3NoNumber of recycling iterations
sampling_stepsint200NoNumber of diffusion sampling steps
diffusion_samplesint1NoNumber of structure samples to generate
max_parallel_samplesint5NoMaximum number of samples to run in parallel
step_scalefloat1.5NoStep scale factor for diffusion

MSA settings

KeyTypeDefaultStaticDescription
use_msa_serverboolTrueNoWhether to query the MSA server. Set to False for single-sequence mode
msa_server_urlstr"https://api.colabfold.com"NoURL of the MSA server
msa_pairing_strategystr"greedy"NoStrategy for pairing MSA sequences
max_msa_seqsint8192NoMaximum number of MSA sequences to use
subsample_msaboolTrueNoWhether to subsample MSAs
num_subsampled_msaint1024NoNumber of subsampled MSA sequences

Output settings

KeyTypeDefaultStaticDescription
write_full_paeboolTrueNoCompute and include full PAE matrix
write_full_pdeboolTrueNoCompute and include full PDE matrix
write_embeddingsboolFalseNoInclude model embeddings in output

MSA caching

KeyTypeDefaultStaticDescription
msa_cache_enabledboolTrueNoEnable or disable MSA caching

MSA caching

Boltz-2 automatically caches MSA results to avoid redundant server queries. When you fold a sequence for the first time, the MSA is fetched from the server and stored locally. Subsequent predictions for the same sequence reuse the cached MSA. Cache files are stored in a hash-prefixed directory structure under {cache_dir}/boltz/msa_cache/. An index file (msa_index.json) tracks cached entries with metadata including creation time, last access time, and file size. To disable caching, set config={"msa_cache_enabled": False}.

Multimer prediction

Separate chains with ":" in the sequence string:
result = model.fold("MKTVRQERLKSIVRI:LERSKEPVSGAQLAEE")

print(result.atom_array)   # structure with both chains
print(result.confidence)   # includes inter-chain metrics

Single-sequence mode

To run without MSA (faster but potentially lower quality):
model = Boltz2(backend="modal", config={"use_msa_server": False})
result = model.fold("MKTVRQERLKSIVRI")