Working with JSON data in python

Yves Boutellier
Towards Data Science
5 min readAug 11, 2021

--

Photo by Ferenc Almasi on Unsplash

In this article I want to focus on a format type called JSON. I am pretty sure that you have heard about JSON before. But if not let me briefly describe JSON to you.

JSON is the JavaScript Object Notation and was inspired by the subset of the JS programming language dealing with object literal syntax. However, it is meanwhile language agnostic meaning it doesn’t matter if you don’t write JavaScript. You can deal with JSON formatted data in any other programming language.

But why should JSON matter to you, you care about data science right? Well I stumbled across JSON while building my own python project trying to land a data science / machine learning engineer job. Later more on that. [Section: “Real World?”] Thus if you are on the same journey like me and you want to learn another useful tool. This article is for you. Even if you are at a different stage in your career, I am convinced that since the titled caught your attention you want to refresh your knowledge about JSON format to solve a current problem or you seek a new skill in your data science python arsenal.

How does JSON look like

Okay how does JSON look like and what does it support?

This is the file example.json

As you can see it supports primitive types like strings, integers as well as lists and nested objects. And it looks like Python. But attention, I provided a read-conversion and a write-conversion table so that you are informed where there are differences between JSON and python.

Why store data in Python using the JSON module?

  1. JSON format enables programmers to dump simple data structures into a file and load them back into programs when needed
  2. Data can be shared between programs of other language using JSON formatted files.

Since the advantages of JSON format come from storing and obtaining the data this article shares with you the functions and examples so that the next time you want to handle data the JSON format skills is at your disposal.

If you want further articles on this topic it’s necessary that you know some vocabulary. The encoding of data into JSON is called serialization (data is stored as series of bytes). The reverse process obtaining data from a JSON formatted file is called deserialization. But I think it’s not necessary to know that for basic tasks. So now let’s get started to make some experience.

Data to json — Serialization

Imagine you build a program that creates data but you want also to share the data and the underlying information to other users/programs. This is why you want to take data and store it in a JSON formatted file. But how is data translated? I provide you with a conversion table for Python.

table by author

Okay now that we know what we can expect from the conversion during the storing process we can look at the functions that come with the built-in package json.

import json

We have two functions in the json library called json.dump() and json.dumps() that are possibly used for serialization.

Let’s look at one at the time.

json.dump() stores data object as JSON string in .json file

The json.dump(data, file_object) function takes in two arguments:

  1. Data that needs to be written to a JSON file.
  2. A file object that can be used to save the data

example.json looks like the following

json.dumps() converts python object into a JSON string

It is pretty straightforward.

JSON to Data — Deserialization

Likewise deserialization is the process of converting JSON data into a native data type. Here, we convert JSON data back to a dictionary in Python. But once again we have to consider the conversion table for Python in order to be aware of possible mistakes that could happen.

table by author

And see, if we encoded a tuple, it became an array and an array becomes a list. Please remember this.

Next, we want to check out both functions that are used for deserialization.

json.load() reads in json files to python types

json.load(filename)

The data is coming from the example.json file and is stored in the dictionary called data.

json.loads() converts a json string into a python type

Real world?

I already mentioned in the introduction that I stumbled across JSON while building my own python project trying to land a data science / machine learning engineer job. My goal was to store data I gathered from the API into a file such that I can it later use for testing. It’s pretty common to receive data from API as json strings. With the library called requests you can get data from an API. You then need to convert the data to an python object, this can be easily done just use response.json() without any arguments.

Now imagine you make some wrangling and decide to take a subset of the data and want to store it for later use. You learned that you could use json.dump(data, filename) and have it available for later. Yeah!

Conclusion

Glad you learned how to store data in a way that is manageable pretty universally with many other languages. You also know where to lookup the conversion tables and you saw some examples that you can change without much effort such that they are useful to your own needs. I summarized the key workflow steps in the following 4 points.

  1. Import the json package
  2. Read the data with json.load(JSON_string) or json.loads(filename)
  3. Process the data
  4. Write the altered data with dump(data, filename) or dumps(data)

Thank you for sticking to the end. I welcome you to read one of my other articles.

--

--