Try in Notebook

Learn how to fold large proteins, multi-chain proteins, and at scale via REST API.

Fast & Accurate PDB Prediction with ESMFold¶

Having the ability to use AlphaFold2, ESM, and other recent structural modeling NNs is great, but what if you don't want to leave Python, don't want to spin up a GPU, want to avoid conterization, or need to massively scale out your PDB file prediction / creation?

You can predict a PDB file for proteins up to 1024+ in length using the highly accurate ESMFold, scaled out and pre-loaded into memory on BioLM.ai. The API docs show an example protein and PDB string response.

Set Your API Token¶

In order to use the BioLM API, you need to have a token. You can get one from the User API Tokens page.

Paste the API token you generated in the cell below, as the value of the variable BIOLMAI_TOKEN.

In [1]:

BIOLMAI_TOKEN = ' '  # !!! YOUR API TOKEN HERE !!!

In [2]:

SEQ = "MAETAVINHKKRNSPRIVQSNDLEAAYSLSRDQKRMLYLFVDQIRKSDGTLQEHDGICEIHVAKYAEIFGLTSAEASKDIRQALKSFAGKEVVFYRPEEDAGDEKGYESFPWFIKRAHSPSRGLYSVHINPYLIPFFIGLQNRFTQFRLSETKEITNPYAMRLYESLCQYRKPDGSGIVSLKIDWIIERYQLPQSYQRMPDFRRRFLQVCVNEINSRTPMRLSYIEKKKGRQTTHIVFSFRDITSMTTG"

print("Sequence length: {}".format(len(SEQ)))

Sequence length: 249

In [3]:

SLUG = 'esmfold-multichain'  # Model on BioLM.ai to use
ACTION = 'predict'  # How to use model: 'generate', 'predict', 'encode', etc

# JSON payload to send to model endpoint
data = {
  "items": [{
    "sequence": SEQ
  }]
}

Make API Request¶

There is already a server on BioLM with ESMFold loaded into memory, so predictions should be fast. Let's import the requests library.

In [4]:

from IPython.display import JSON  # Helpful UI for JSON display

try:
    # Install packages to make API requests in JLite
    import micropip
    await micropip.install('requests')
    await micropip.install('pyodide-http')
    # Patch requests for in-browser support
    import pyodide_http
    pyodide_http.patch_all()
except ModuleNotFoundError:
    pass  # Won't be using micropip outside of JLite

import requests  # Will use to make calls to BioLM.ai

In [5]:

url = f"https://biolm.ai/api/v2/{SLUG}/{ACTION}/"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Token {BIOLMAI_TOKEN.strip()}",
}

In [6]:

# Make the request - let's time it!
import time

s = time.time()  # Start time
response = requests.post(
    url=url,
    headers=headers,
    json=data,
)

e = time.time()  # End time
d = e - s  # Duration

print(f'Response time: {d:.4}s')

result = response.json()

# If you wish to view the full result, you can expand the tree in the cell below
JSON(result)

Response time: 0.254s

Out[6]:

<IPython.core.display.JSON object>

If the model was starting cold, there would be an initial wait time of several minutese to load this large model into memory, after which subsequent API requests would respond normally, without delay. This is what is known as a model cold-start time. It is generally not very noticeable, except in this case since ESMFold is one of the largest protein models to date at the time of this writing.

Visualize Structure in 3D¶

We have the PDB file contents as a string. We can use it directly to visualize the structure.

In [7]:

# View the file contents first
import json

pdb_pred = result['results'][0]  # Extract the contents of the PDB file

json.dumps(pdb_pred)[:1000]  # Look at the first 1000 characters, since PDBs are long...

Out[7]:

'"PARENT N/A\\nATOM      1  N   MET A   1     -24.145  39.783   4.774  1.00 95.18           N  \\nATOM      2  CA  MET A   1     -23.319  39.113   3.772  1.00 96.49           C  \\nATOM      3  C   MET A   1     -22.177  38.348   4.432  1.00 95.32           C  \\nATOM      4  CB  MET A   1     -22.762  40.125   2.770  1.00 94.51           C  \\nATOM      5  O   MET A   1     -21.218  38.952   4.916  1.00 87.32           O  \\nATOM      6  CG  MET A   1     -23.819  40.742   1.869  1.00 90.02           C  \\nATOM      7  SD  MET A   1     -23.109  41.928   0.662  1.00 92.33           S  \\nATOM      8  CE  MET A   1     -23.368  43.487   1.554  1.00 90.34           C  \\nATOM      9  N   ALA A   2     -22.303  37.084   4.659  1.00 94.98           N  \\nATOM     10  CA  ALA A   2     -21.251  36.257   5.245  1.00 95.54           C  \\nATOM     11  C   ALA A   2     -20.153  35.963   4.227  1.00 92.67           C  \\nATOM     12  CB  ALA A   2     -21.836  34.954   5.783  1.00 93.41           C  \\nATO'

Let's use the py3Dmol Python package to visualize the PDB here, in-browser.

In [8]:

try:
    # Install packages for JLite
    import micropip
    await micropip.install('py3Dmol')
except ModuleNotFoundError:
    pass  # Won't be using micropip outside of JLite

import py3Dmol  # Install with `pip install py3Dmol` if running notebook elsewhere

In [9]:

view = py3Dmol.view(js='https://3Dmol.org/build/3Dmol-min.js', width=800, height=400)
view.addModel(pdb_pred, 'pdb')
view.setStyle({'model': -1}, {"cartoon": {'color': 'spectrum'}})
view.zoomTo()

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Out[9]: