Improvements to the Transport For London Unified API

Recently, I stumbled upon an exciting API from Transport for London Unified API.

This API gets insightful data on the different aspects of transportation in London.

However, as I began to use the API, I discovered that some of the responses were in bytes, others in strings, and none in JSON (the standard API response type). This became a challenge when parsing responses from the API.

Then I decided to build on top of this API, making it more developer friendly.

In this article, I will talk about how I built an API on top of an existing API and used Linode to deploy it.

Let's get right into it.

Building the API

I would be explaining how I built most parts of the API in this section, I would not explain each part of this project, I would only talk on a few important parts.

Here are the tools I used for this project:

Django rest framework
MongoDB
Postman
Linode Linux server

Project Setup

Firstly, I set up my local development environment. I did this by, first, creating my virtual environment in my working directory:

mkdir api-dir
cd api-dir 
virtualenv env

Next, I need to activate it and installed the necessary dependencies, like so:

source env/bin/activate
pip install django djangorestframework djangorestframework_simplejwt pymongo

I used pymongo to connect to my mongo-db cluster in Linode. All I needed to do at this point was to create my django project and app, and I was all set.

django-admin startproject main .
python manage.py startapp api

The first command helps us create a new django app called api.

Database Setup

I created my database cluster using Linode and provisioned only 1 node project.

You can create your own database cluster with any database engine. Head on to Linode's cloud manager to get started.

In Linode's cloud management console, by the left, you will see a side panel which you can hover over to access different resources.

Linode Cloud Management Console Image

Right down there, you will see the Databases option. Click on it, this will take you straight to the next page to create our database cluster with any database of our choice, but I will be using mongo-db.

I gave my database a reasonable name (best practice)
Selected MongoDB as my database engine
Chose the nearest availability zone for my database cluster

Chose the smallest size for my database cluster to reduce billing costs :)

I only chose one database node to be in the cluster, because that's what I need
I added an IP address 0.0.0.0/0, so my database cluster can be accessible to anyone. (Don't do this in production).

After creating my mongo-db, I needed to connect to it so I could populate it with the necessary information. This is where pymongo comes in.

With pymongo you can connect to your mongo-db and perform CRUD operations.

To get started, I needed my connection URL. The connection URL has a format that goes like this:

mongodb://username:password@host/?authMechanism=DEFAULT&tls=true&tlsCAFile=path-to-file

You can get the information needed from the summary of your cluster. You will see this immediately after you click on your database cluster when it is done provisioning.

I then connected to my mongo-db in the cloud and performed a few operations to make sure there were no errors.

Creating API Endpoints

Moving on, I had already set up my development environment and the database that I would be working with, the next thing on my agenda was to create the API endpoints. I would not be going over how I created only three endpoints here.

The First Endpoint

The first endpoint I worked on was the AccidentsStats Endpoint.

This endpoint returns the details for accidents that occurred within a specific year.

I encountered some issues when working on this endpoint. First, the return type was a string, and depending on the year, the size of the response was huge, ranging from 20megabytes to 40megabytes.

I immediately converted the string response into a list using the .json() method like so:

url = "https://api.tfl.gov.uk/AccidentStats/2005"
r = requests.get(url)
main_list = r.json()

I later populated my database with the accident statistics data. I made each year with accident statistics (not all years have accident statistics) a collection and the response data the documents for the respective years, so when you hit an endpoint like api/accident-stat/2005, it returns the documents in the collection 2005.

Here is the first API endpoint, it handles getting accident statistics for each year.

@api_view(['GET'])
def get_accidents_stats(request, year):
    connect_to_mongo()
    if request.method != 'GET':
        return Response({"Error": "Invalid Response Type"})
    cursor_list = [cursor for cursor in db.list_collection_names()]
    print(cursor_list)
    if year not in cursor_list:
        return Response({"Message": f"There is no accident stat in the year {year}"}, status=status.HTTP_400_BAD_REQUEST)
    main_cursor = db[f'{year}']
    main_list = []
    for ele in main_cursor.find({}):
        del ele['_id']
        main_list.append(ele)
    return Response(main_list)

The Second Endpoint

The second endpoint I worked on was the BikePoint Endpoint

This endpoint returns all available BikePoints in London.

I sent a request to this endpoint, and I go a strange return type. A byte. I thought this was going to be a hard one, converting from bytes to string and then to a list or a dictionary. This was a fairly easy task. I looped through the return bytes, converted it to a string, and then converted the string to a dictionary with json.loads(string).

Here is the code snippet:

url = f'https://api.tfl.gov.uk/BikePoint/'
r = requests.get(url)
main_string = ""
for ele in r:
    string_ele = ele.decode("utf-8")
    main_string+=string_ele

new_list = json.loads(main_string)

With this, I could now work with the result list type. I inserted the list into the bike-points collection I has created in the database like so:

collection = db['bike-point']
main_list = []
for ele in new_list:
    del ele['$type']
    main_list.append(ele)

collection.insert_many(main_list)

Now I can work with the bike-point collection from the database and return all the available BikePoints in London.

Here is the second API endpoint:

@api_view(['GET'])
def get_bike_points(request):
    if request.method != 'GET':
        return Response({"Error": "Invalid Request Type"}, status=status.HTTP_400_BAD_REQUEST)
    connect_to_mongo()
    cursor = db['bike-point']
    main_list = []
    for ele in cursor.find({}):
        del ele['_id']
        main_list.append(ele)

    return Response(main_list, status=status.HTTP_200_OK)

The Third Endpoint

The third endpoint I worked on was the BikePoints/id

This endpoint returns information on a specific BikePoint given it's id.

Since I had already had all the bike points in my database, all i had to do was a database lookup with the id passed in through the request url to find the id of the respective BikePoint in the database.

Here is the third API endpoint:

@api_view(['GET'])
def get_bike_point_id(request, bike_point_id):
    connect_to_mongo()
    if request.method != 'GET':
        return Response({"Error": "Invalid Request Type"}, status=status.HTTP_400_BAD_REQUEST)
    cursor = db['bike-point']
    bike_point = cursor.find_one({"id": bike_point_id})
    if bike_point == None:
        return Response({"Message": f"Could not get BikePoint with id {bike_point_id}"}, status=status.HTTP_200_OK)
    bike_point.pop('_id')
    return Response(bike_point, status=status.HTTP_200_OK)

The code for the remaining endpoints can be found on my github.

Deployment Setup

At this point, I was done with building out most of the API endpoints, and it was time to deploy my API. I chose to use Linode to deploy my project. I provisioned my Linode server and deployed my project.

If you decide to use Linode servers to deploy your projects, here is a quick guide on how to get started.

Head on to the Linode Management console, you'd see a button on the console page that says Create Linode. Click on it.

On the next page, you'd be greeted with different options o how to your define your server.

Here you can choose from a list of Linux distributions your server should run on.
The region where your server should be launched, and more.

When you scroll down, you will see the options to choose the amount of computing power you want on your server.

Screenshot from 2022-06-20 16-01-00.png

It's best to choose the least expensive compute power (Shared CPU) for your server if you are still using your Linode credits.

Next, give your server a name through the Linode Label input field, plus you have the option of adding tags and ssh keys.

There are additional configurations for your Linode server such as the addition of a VLAN, a backup for your server, plus the option of having a private IP address

Screenshot from 2022-06-20 16-10-14.png

Back to my story :).

I deployed my API with a Linode server. I can not go into the deployment process because it's too long, but you can find a detailed explanation of how to deploy your django project to a virtual machine with nginx and gunicorn in this digitalocean article.

I followed the steps in the article and deployed my API successfully.

Cachning API Responses

The API response time was still very slow because of the size of the data, the responses are large, which increased the API response time, which was a concern. I decided to implement a caching system with Redis.

I found amazing resources online on how to go about this, shoutout to Sagar Yadav for his article on how to implement caching in django with redis. I found this really helpful.

However, there was still an issue, the article only explained how to implement this cache with localhost. I needed to set up a remote Redis server and cache the API responses on the remote server.

Luckily for me, I already had a server (where I deployed my API), so I set up my Redis server on my Linux server.

Firstly, I had to install install Redis on my server like so:

 sudo apt install -y redis

Next, I had to add my bind my host IP to the Redis server by editing the redis.conf file like so:

nano /etc/redis/redis.conf

On the line with bind 127.0.0.1 ::1 I added my host IP next to it and saved my file. Then restarted my Redis server like so:

systemctl restart redis

I tested my connection to my remote Redis server with this command:

redis-cli -h <host-ip>

I had no errors thankfully, what was left for me now was to set up Redis with django rest framework.

First, I had to install django-redis

pip install django-redis

Next, I needed to set up my cache connection with django and redis in my settings.py file

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": "redis://<host-ip>",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        }
    }
}

Followed by specifying my Session Engine like so:

SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_CACHE_ALIAS = "default"

Then added a TTL for the response caches

CACHE_TTL = 60 * 1

Finally, imported my cache_page decorator and added it to my API endpoints like so:

@cache_page(60*15)
@api_view(['GET'])
function_to_handle_endpoint

Conclusion

This project is open-source which is open to contributions, I have not replicated all the API endpoints from the Transport for London Unified API. With the help of other developers, I know we can build something much better and more efficient.

For more information on this project, check out the links below: