23andMe API Authentication

23andMe provides an API that allows customers to access their genome data programmatically, but access to the API is safeguarded using the OAuth 2.0 framework. Although the intended audience for 23andMe’s developer API seems to be mobile app developers, it’s still possible to access raw genome data for more academic purposes. On this page, I’ll describe a minimal number of steps to access a customer’s raw genome data through 23andMe’s API.

Please refer to the 23andMe API Terms of Service to determine if your use case is appropriate.

Introduction

A number of steps and technologies are involved in order to access raw genome data through 23andMe’s API. First, we’ll create an account and register an “app” with 23andMe. Then to work through their chosen authentication scheme, OAuth 2.0, we’ll create a couple of PHP scripts that run on a web server. Finally, we’ll create two little Python scripts that downloads some raw genome data.

The majority of this work will involve working with the OAuth 2.0 framework, which is defined in RFC 6749. From the RFC:

The OAuth 2.0 authorization framework enables a third-party application to obtain limited access to an HTTP service, either on behalf of a resource owner by orchestrating an approval interaction between the resource owner and the HTTP service, or by allowing the third-party application to obtain access on its own behalf.

Basically, OAuth 2.0 is the means to access restricted data from 23andMe, and it allows 23andMe to maintain the responsibility of storing and validating customer authentication credentials.

The following flowchart depicts the authentication process we will follow to obtain an access token, which will then be used to access the raw genome data through 23andMe’s API. Details about the individual steps will be described in later sections.

23andMe OAuth2

Creating a Developer Account

23andMe automatically creates accounts for its customers so they can browse the 23andMe website and view information about their genome or genomes of profiles associated with their account. To access this same information programmatically, the customer (or app developer) must create a developer’s account at api.23andme.com and register an app.

Although we won’t be creating an app per se, we still need to register an app at 23andMe so we can obtain a client_id and client_secret that will be used during the authentication process. As of June 2014, this information can be found on 23andMe’s API Dashboard. These two credential codes are displayed on the right side of the page; each code is a unique 32-character hexadecimal string.

23andMe Dashboard

Another important field that we must provide on this dashboard is the Redirect URI. During the authentication process, our browser will be temporarily redirected to 23andMe’s authorization page. When that stage completes, 23andMe will then redirect our browser to this URI, which will in turn perform the final steps of the authentication process. The implementation of this page will be described later, but for now, let’s assume this URI is http://example.com/redirect/.

Initiating the Login

Authentication begins when a browser issues a request to 23andMe. To avoid entering long values into the address bar, we’ll create a simple web page in PHP that redirects the browser to 23andMe’s authorization page along with our app’s client_id and Redirect URI. Let’s assume this file is stored on our web server at http://example.com/login/.

<?php

// Script #1: Redirect to 23andMe with client_id.

$client_id    = '00000000000000000000000000000001';
$redirect_uri = 'http://example.com/redirect/';

header("Location: https://api.23andme.com/authorize/"
     . "?redirect_uri=$redirect_uri"
     . "&response_type=code"
     . "&client_id=$client_id"
     . "&scope=basic+genomes");
?>

In this PHP script, the $client_id and $redirect_uri variables match the corresponding values from the 23andMe API Dashboard. I won’t go into details regarding the scope parameter except to say it’s a collection of tokens, separated by ‘+’ symbols, that indicates what kind of information we plan to access from the customer’s account. Further details about this field can be found in 23andMe’s authentication documentation.

When we visit this page with our web browser, we’re automatically redirected to http://api.23andme.com/authorize/. It’s here that we enter the customer’s login information, which is maintained and validated by 23andMe. After providing valid credentials, 23andMe displays a warning page, customized for our app and the scope requested by our first script. Once we press the green “Yes, grant access.” button, 23andMe redirects the browser to our Redirect URI.

23andMe Login

Responding to the Redirect

After providing the proper login credentials, our browser is redirected to our Redirect URI, specified on the 23andMe API Dashboard. Inserted into the query string of this URI is a temporary code that we can use to obtain an access token. The key is code, and the value is another unique 32-digit hexadecimal string. For example, if our Redirect URI is http://example.com/redirect/, the browser may be redirected to something like http://example.com/redirect/?code=290feb71e36dda54ee199d8caffa4f1b.

Our second PHP script must then request an access token from 23andMe using this code along with the client_id and client_secret. If everything is successful, 23andMe will reply with JSON that contains the access token. The URL providing the access token is https://api.23andme.com/token/, and the various fields are posted through custom HTTP headers. For example, here’s a little PHP script we could use at http://example.com/redirect/ to simply display the access token.

<?php

// Script #2: Request access token with code,
// client_id, and client_secret.

// The code will be something like:
//   290feb71e36dda54ee199d8caffa4f1b
$code = $_GET['code'];

// We will post these fields to 23andMe.
$post_field_array = array(
  'client_id'     => '00000000000000000000000000000001',
  'client_secret' => '00000000000000000000000000000002',
  'grant_type'    => 'authorization_code',
  'code'          => $code,
  'redirect_uri'  => 'http://example.com/redirect/',
  'scope'         => 'basic genomes');

// Encode the field values for HTTP.
$post_fields = '';
foreach ($post_field_array as $key => $value)
  $post_fields .= "$key=" . urlencode($value) . '&';
$post_fields = rtrim($post_fields, '&');

// Use cURL to get the JSON response from 23andMe.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://api.23andme.com/token/');
curl_setopt($ch, CURLOPT_POST, count($post_field_array));
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$encoded_json = curl_exec($ch);

// The access token is returned via the 'access_token' key.
$response = json_decode($encoded_json, true);
$access_token $response['access_token'];

// The access_token will be something like:
//   0d4d50f83f6c3538e3df74d593e74b96
print $access_token;
?>

At this point, we’ve completed the steps outlined in the flowchart of our introduction. We’ve obtained an access token through the OAuth 2.0 framework implemented by 23andMe. And the authentication framework has accomplished its goals: At no time were the customer’s credentials intercepted by the app developer, and at no time was the app developer’s client_secret intercepted by the customer.

The access token itself is yet another 32-character hexadecimal string. Access tokens expire after 24 hours although it is possible to request new tokens without bothering customers to reenter their login details. Refreshing tokens isn’t particularly important in our case, however, because we aren’t implementing an app, and it won’t take nearly a day long to download the genome data.

Calling the API

The GET /1/user/ Endpoint

With the access token, it’s possible to interact with the REST API provided by 23andMe. Their API is simply a set of web pages, called endpoints, that returns various data in JSON format. The access token must be passed to these endpoints as a custom HTTP header in the form, Authorization: Bearer ACCESS_TOKEN. For example, with an access token of 0d4d50f83f6c3538e3df74d593e74b96, the request for GET /1/user/ endpoint would be (via SSL):

GET /1/user/ HTTP/1.1
Host: api.23andme.com
Authorization: Bearer 0d4d50f83f6c3538e3df74d593e74b96

Since one customer account can have access to multiple genotyped people (profiles), we’ll first query the available profiles for the account. This is done using 23andMe’s GET /1/user/ endpoint demonstrated above; i.e. we’ll download content from https://api.23andme.com/1/user/ using an HTTP GET request, passing the access token in the request header. The response from 23andMe will look something like the following JSON:

{
  "id": "a42e94634e3f7683",
  "profiles": [
    {
      "genotyped": true,
      "id": "c4480ba411939067"
    },
    :
  ]
}

It probably isn’t necessary to query the list of profiles many times, assuming no additional profiles are being added to the account. But in my experience, the profiles sometimes become available long before the genotypes become available. In the following script, therefore, I’ll filter out the profiles that have not been genotyped.

In the examples that follow, I’ll be using Python with the requests module, which I have found to be a convenient module to work with 23andMe’s API.

#!/usr/bin/env python

import json
import requests
import sys

# Grab the access token from the command line.
access_token = sys.argv[1]

# We download the list profiles from this URL
# (endpoint), passing the access token through
# the HTTP headers.
url = 'https://api.23andme.com/1/user/'
authorization = 'Bearer {0}'.format(access_token)
headers = {'Authorization': authorization}

# Now we download and print the list of profiles,
# which arrives in JSON format; we filter out the
# profiles that are not genotyped.
data = requests.get(url, headers=headers).json()
for item in data['profiles']:
  if item['genotyped']:
    print item['id']

If we were to call this script user.py, executing it might produce results that look like the following. Note that for clarity, I’ll export the access token as $ACCESS_TOKEN and pass the variable as the first command line argument.

$ export ACCESS_TOKEN=0d4d50f83f6c3538e3df74d593e74b96

$ ./user.py $ACCESS_TOKEN
fb351a4850a44295
03192675babb5cb3
356e806fa8d05210
:

In some cases, we need to know additional detail about the profile IDs, and for that 23andMe provides other API endpoints. For our purposes, though, we’ll simply download the full genome for the first profile ID, fb351a4850a44295.

The GET /1/genomes/ Endpoint

We will use 23andMe’s GET /1/genomes/ endpoint to download and print the raw genome for a profile, again specified on the command line. The response from 23andMe will look something like the following JSON:

{
    "id": "fb351a4850a44295",
    "genome": "ACTAGTAG__TTGADDAAIICCTT..."
}

The genome value will be long, exactly 2,266,424 nucleotide symbols (1,133,212 base pairs). The order of the base pairs is provided by snps.data from 23andMe. The first few lines of this file are as follows:

# Updated: 2013-11-11
# index is a key for the /genomes/ endpoint (2 base pairs per index).
# strand is always +1.
index  snp         chromosome  chromosome_position
0      rs41362547  MT          10044
1      rs28358280  MT          10550
2      rs3915952   MT          11251
3      rs2853493   MT          11467
4      rs3088053   MT          11812
5      rs2853498   MT          12308
:

This means the first two nucleotide symbols of the genome string correspond to SNP rs41362547 of the MT chromosome, the second two nucleotides of the genome string correspond to SNP rs28358280 of the MT chromosome, and so on. There are 1,133,212 records in this file just as there 1,133,212 base pairs encoded in the genome string.

23andMe SNP order

As with the previous script, we’ll pass the access token on the command line. We’ll also provide the profile ID.

#!/usr/bin/env python

import json
import requests
import sys

# Grab the access token and profile id from the command line.
access_token = sys.argv[1]
profile_id = sys.argv[2]

# We download the entire genome for the profile from this URL
# (endpoint), passing the access token through the HTTP headers.
url = 'https://api.23andme.com/1/genomes/{0}/'.format(profile_id)
authorization = 'Bearer {0}'.format(access_token)
headers = {'Authorization': authorization}

# Now we download and print the full genome, which arrives in
# JSON format.
data = requests.get(url, headers=headers).json()
print data['genome']

If we were to call this script genomes.py, executing it might produce results that look like the following.

$ ./genomes.py $ACCESS_TOKEN fb351a4850a44295
ACTAGTAG__TTGADDAAIICCTT...

Resources