Testing: Mock Requests for API Wrapper

Do you want to become a machine learning engineer? Improve your programming projects with tests that mock http-requests.

Yves Boutellier
Towards Data Science

--

TL;DR

  • Build your own ML project
  • A good choice is to work with the twitter API, since it allows you to access very present data and it can be extended to large scale
  • When building a machine learning project that needs to be maintained over an extensive period of time, write tests.
  • Tests for API wrappers are special. You cannot rely on simple requests for your tests, since you receive data that changes from time to time. That’s why you need to mock requests.
  • A library for python to mock requests is called responses. I show you a few examples to get started.
Photo by JESHOOTS.COM on Unsplash

Intro

I expect that you know that building projects is a preferred way to demonstrate to future employers that you are a suitable pick if you don’t have job experience in the industry.

That’s why I started to build a wrapper for the twitter API on my own. Soon after starting the project I reminded myself, that I wanted to make a decent project. Emulate how machine learning engineers work in the industry. I started to use git and version control. It felt not enough. I wanted to get as far away as possible from tinkering code and I wanted to work more seriously and organised on code than ever before.

From this point of view, I was convinced that I wanted to test my code with a one or many test suites. I knew how to write simple tests. Simple ones like instantiate a class and then check if the fields are correctly attributed or providing an input to a function and check the output of the function with the desired output. But writing tests for an API wrapper or a program that interacts with an extern database, writing tests is slightly more complicated.

But you get this. Take action and learn about mock requests to accelerate your learning with your machine learning project.

Test Requirements

Focused

We want our test to focus on specific code, that is to say a single method or class. If our test fails, we know which area of code we have to check for the defect that occurred. Thus a test should focus on the least amount of lines as possible. This makes finding the defect much easier.

Predictable

This is a very important point and maybe obvious to you but never neglect predictability of your tests. Otherwise your testing experience is unpleasant, inefficient and confusing. If you test your function foo on data x you want to receive the same answer every time and it should not depend on the location of the moon or whatsoever.

Fast

Time is money. We all have heard this quote. If your tests last longer, your cycles of improvements will take more time and shipping your code will be delayed.

Testing with extern data source

Challenges of working with extern data source (Restrictions)

  • Since data from your extern data source might change your tests can be unpredictable. Therefore you are at the mercy of your data source.
  • Latency due connection and receiving data slows down your tests
  • Limits and Caps: If you receive live data from an API your requests are rate-limited and maybe you have a cap that says how many requests you are allowed to make over a month. Therefore, if your tests request data to test the functionality of the code you use up your monthly requests.

Our requirements can be fulfilled and our restrictions can be overcome if we mock our requests

Mocking Requests (The Solution)

If we mock our requests, we receive data from our requests which does not change over time. We don’t have to connect and receive data from a data base and wait for the data to arrive. And we don’t use up our valuable requests that are presumably capped.

The tests are

  • fast (no connection and transfer of data)
  • predictable (we have control over the data that is tested)
  • focused (we don’t need to test external dependencies)

What are Mock requests?

With mock requests we simulate the interaction between our code and the extern data source, such that we have fully control over this interaction. We simulate the request with a so-called mock request. From the section before we learned that we use this procedure if we write a test that has external dependencies. By using a mock approach we can isolate and focus on the code being tested and not the behaviour or interaction with the external dependency. But remember this is only possible if our code is based on interfaces, such that it doesn’t matter what exactly is passed into our system as long as it implements our interface.

Programming to an interface is why mocking works. We can pass in mocked or faked objects that simulate actual run-time objects that would come from our extern data source.

Be aware that mocking frameworks complement unit testing frameworks. They don’t substitute those. Mocking frameworks isolate dependencies and therefore help to write more concise unit tests.

How to Mock requests?

I will showcase a few examples I did in my own project and will explain the working mechanisms such that you can get started more easily.

First of all, this short tutorial will be done in python with unittests. To work along I suggest you import beside unittest the library called responses. Responses has also classes called GET and POST. They will be used to mock requests.get() or requests.post() .

Let’s look at our first example. I show you a test which should check if my function getUser() correctly retrieves and stores data and returns a User object. I will show two possibilities to test the function getUser() .

Mock with responses.RequestsMock()

I want you to focus on several lines:

3: We create a responses instance

4: this instance is then used to activate the functionality otherwise requests.get() would not trigger the library responses to mock the response

6–8: This is very important, since you don’t receive data via a database or API you need to store the data yourself. This makes the tests predictable and fast.

10: responses.add() is to queue every response you would get from requests.get() , so if your function calls requests.get() two times, you need two write responses.add() two times as well. Arguments you need every time is GET or POST , the URL which has to match the url that is passed into requests.get() and the data that requests.get() returns instead of the real data it would return under a non-test environment.

13: In this line requests.get() is invoked but does not connect to the url, instead it receives a mock response you queued with responses.add()

15–17: Are the lines you usually write in other tests as well

20,21 (voluntary): These two lines are only necessary, if tests are queued and not cleaned-up anyways. In unittest you have usually two functions setUp() and tearDown() . If you wanted to instantiate an object based on a request in setUp() that should be used in every testcase you would need to invoke cleanup since the queued responses could interfere with the subsequent tests you want to run. Otherwise, leave those two lines in the function tearDown() . The next approach gets by without using line 3,4,20,21 but is error prone if you need chained events as mentioned before.

Mock with @responses.activate

This version works with a decorator, that has the same effect like responses.start() . It is used to activate the functionality otherwise requests.get() would not trigger the library responses to mock the response. Otherwise it’s fairly the same, but remember the caveat I mentioned before!

Possibly Errors and Solutions

I must admit I am also fairly new to this topic and sometimes it can be a little daunting to think through errors you receive. I decided to list a few and my reasoning to get them out of my way.

URLs

One thing you need to provide to your function responses.add() is the URL. Sometimes you get an error back like:

requests.exceptions.ConnectionError: Connection refused by Responses - the call doesn't match any registered mock.

Please make sure it matches the url provided to requests.get() otherwise a neat trick many do is to provide a regular expression that matches the different urls. For example, when working with twitter API the url always looks something like this: https://api.twitter.com/2 you could provide a regular expression:

Data Type

I once received this error message:

TypeError: a bytes-like object is required, not ‘dict'

I realised that I mistakenly loaded in data this way:

However, if you know json.load(f) creates a python object, a dictionary. But I want to remind you that responses and requests need just a bytes-like object.

Thank you for reading until the end. As a thank-you gift I summarised the most important aspects covered in this article for you.

Summary

I walked you through a short introduction that hopefully convinced you that mocks are needed to have higher testing coverage of your ML project. In order to fulfil requirements of tests like focus, speed, predictability for functions that interact with extern databases, use mocks. I also showed you how you can use the library called responses to mock your requests in python.

Other Articles I published on towardsdatascience

most popular

most read

related to this article

Resources I needed to write this article

--

--