Create a PostgreSQL database in a Docker container

Posted on Tue 01 August 2023 in Databases

This post will show you how user Docker to easily create a PostgreSQL database server loaded with the Chinook sample database that you can use to test your Python programs.

PostgreSQL is an open-source database that is very popular with Python developers. It is simple to configure and offers many advanced features.

The Chinook database is a sample database available for most mainstream database servers such as PostgreSQL, SQL Server, and MySQL. The database emulates the data used by an imaginary media store that sells music and video files over the Internet. It is easy to set up because it can be installed by running a single SQL script.

Docker provides a convenient way to create a PostgreSQL database server that you can use to test your Python programs locally. If you have not yet installed Docker, see my previous post about creating an MS SQL Server container to see the procedure I followed.

In this post, you will connect your Python program to the database using the psycopg2 Postgres database adapter library or the SQLAlchemy framework.

Get the PostgreSQL image

The official PostgreSQL Docker image is available on the Docker Hub container repository. Open a terminal on your Linux PC and run the Docker pull command to download the image to your local Docker repository:

$ docker pull postgres

Get the Chinook database script

Download the Chinook database installation script from the developer's GitHub account:

$ wget https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_PostgreSql.sql

Fix the file encoding issue

The text in the Chinook SQL script is encoded in LATIN1. The official Postgres image was built on a base that only supports UTF-8 text. The Chinook_PostgreSql.sql script will cause an error if you run it on a container created from the oficial PostgreSQL Docker image.

To solve this problem, convert the text in the SQL script file to text encoded as UTF-8. Use the iconv command, as shown below:

$ iconv \
    --from-code ISO-8859-1 \
    --to-code UTF-8 Chinook_PostgreSql.sql \
    --output Chinook_PostgreSql_utf8.sql

Other Chinook database issues in Postgres

Postgres is case-sensitive, where many other database engines are not. Postgres expects that database object names will be in lower case or "snake" case. However, the Chinook database object names are in "proper" case, and "pascal" case so they contain a mix of upper- and lower-case characters.

This is not a problem. It only means that, if you write SQL statements to query the Chinook database, you will have to use quotes around the database object names. You will see this in the examples below.

Build a Chinook database image

The easiest way to create the Chinook database image is to create a new image that adds a layer to the official Postgres image. Create a dockerfile that takes the official Postgress Docker image, sets the POSTGRES_DB environment variable to "chinook", and copies the Chinook install script to the the new image's docker-entrypoint-initdb.d directory. Then, run the Docker build command to create the new image using the dockerfile commands.

Edit a file named "Dockerfile" in your favorite text editor:

$ nano Dockerfile

Add the following information to the file:

FROM postgres 
ENV POSTGRES_DB chinook
COPY Chinook_PostgreSql_utf8.sql /docker-entrypoint-initdb.d/

Save the file.

Make sure that the Chinook_PostgreSql_utf8.sql file you previously created is in the same host directory as the dockerfile. Then, run the following build command. Choose your own password for the container's postgres user.

$ docker build \
    --tag postgres-chinook-image \
    --file Dockerfile .

Check that the image is in your Docker local repository:

$ docker images
REPOSITORY                       TAG       IMAGE ID       CREATED             SIZE
postgres-chinook-image           latest    503246416dbf   About an hour ago   414MB
postgres                         latest    43677b39c446   4 days ago          412MB
mcr.microsoft.com/mssql/server   latest    683d523cd395   3 weeks ago         2.9GB

Create a database container

Create a new database container called chinook1 from the new image you created. Run the following command:

$ docker run \
    --detach \
    --env POSTGRES_PASSWORD=abcd1234 \
    --network host \
    --name chinook1\
    postgres-chinook-image

Containers starting for the first time will, if a PostgreSQL database does not already exist, run all SQL scripts stored in the docker-entrypoint-initdb.d directory. So, the new container will run the Chinook_PostgreSql_utf8.sql script against the default database, now named chinook. The container will now contain the media store data in a database named chinook.

Verify the database

Use the psql utility to check that the database initialized as expected. Start it in interactive mode on the container. Use the default username, postgres, and connect it to the database, chinook:

$ docker exec -it chinook1 psql \
    --username postgres \
    --dbname chinook

chinook=#

Then list the tables in the database with the \d command:

chinook=# \d
             List of relations
 Schema |     Name      | Type  |  Owner   
--------+---------------+-------+----------
 public | Album         | table | postgres
 public | Artist        | table | postgres
 public | Customer      | table | postgres
 public | Employee      | table | postgres
 public | Genre         | table | postgres
 public | Invoice       | table | postgres
 public | InvoiceLine   | table | postgres
 public | MediaType     | table | postgres
 public | Playlist      | table | postgres
 public | PlaylistTrack | table | postgres
 public | Track         | table | postgres
(11 rows)

If you see a similar list of tables, you know that the database initialized. Test a table to see that there is data in it. For example, list some information from the Employee table 1:

chinook=# SELECT * FROM "Employee" LIMIT 2;

 EmployeeId | LastName | FirstName |      Title      | ReportsTo |      BirthDate      |      HireDate       |       Address       |   City   | State | Country | PostalCode |       Phone       |        Fax        |         Email          
------------+----------+-----------+-----------------+-----------+---------------------+---------------------+---------------------+----------+-------+---------+------------+-------------------+-------------------+------------------------
          1 | Adams    | Andrew    | General Manager |           | 1962-02-18 00:00:00 | 2002-08-14 00:00:00 | 11120 Jasper Ave NW | Edmonton | AB    | Canada  | T5K 2N1    | +1 (780) 428-9482 | +1 (780) 428-3457 | [email protected]
          2 | Edwards  | Nancy     | Sales Manager   |         1 | 1958-12-08 00:00:00 | 2002-05-01 00:00:00 | 825 8 Ave SW        | Calgary  | AB    | Canada  | T2P 2T3    | +1 (403) 262-3443 | +1 (403) 262-3322 | [email protected]
(2 rows)

Now you can be confident that the database is ready to use. Quit the psql utility:

chinook=# \q
$ 

Success

At this point, you are done. You have a sample database running on the container. Programs running on your computer can access the server at the host PC's localhost IP address (127.0.0.1) and TCP port 5432.

The rest of this post covers using Python to connect to the database running on the container and adds some notes about creating a Docker volume to persist data even if you delete the container.

Connect Python programs to the database

Connecting to a database running on a docker container is the same as connecting to a remote server. You need to install the database library and framework packages in your Python virtual environment so you can use them in your programs. You need to pass information about the network address, port, username, password, and database name to the database library or framework you use in your program.

Create the environment

First, install some Python development libraries that pip needs to build the psycopg2 package:

$ sudo apt update
$ sudo apt install libpq-dev python3-dev libpq-dev
$ sudo apt install build-essential

Then create and activate a new Python virtual environment:

$ python -m venv .venv
$ source .venv/bin/activate
(.venv) $

Install the Python packages you will use in your programs:

(.venv) $ pip install wheel
(.venv) $ pip install psycopg2
(.venv) $ pip install tabulate
(.venv) $ pip install sqlalchemy

Also, install Jupyter Notebooks. I use Jupyter like an "advanced REPL" because it makes it easy to follow along with blog tutorials like this one.

(.venv) $ pip install jupyterlab

Start Jupyter. If you don't want to use Jupyter, you can follow along with the normal Python REPL or use your favorite text editor.

(.venv) $ jupyter-lab

The psycopg2 library

Create a connection to the Postgres database running on the container using the psycopg2 library. Use the connectionstrings.com web site to see the information needed by Postgres.

import psycopg2

conn = psycopg2.connect(
    host="localhost",
    database="chinook",
    user="postgres",
    password="abcd1234")

First find the schemas available in the chinook database.

statement = """
SELECT DISTINCT
  TABLE_SCHEMA
FROM INFORMATION_SCHEMA.TABLES
ORDER BY TABLE_SCHEMA
"""

cursor = conn.cursor()
cursor.execute(statement)
print(cursor.fetchall())
cursor.close()

The output shows that, in addition to the system schemas, there is a schema called public.

[('information_schema',), ('pg_catalog',), ('public',)]

And, we know from the results of the psql command we ran earlier that the Chinook database tables are in the public schema. The following code will list all the tables:

from tabulate import tabulate

statement = """
SELECT 
  TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA != 'pg_catalog'
  AND TABLE_SCHEMA != 'information_schema'          
ORDER BY TABLE_NAME;
"""

with conn.cursor() as cursor:
    cursor.execute(statement)
    headers = [h[0] for h in cursor.description]
    tables = cursor.fetchall()

print(tabulate(tables, headers=headers))

We see that tables listed, below:

table_catalog    table_schema    table_name     table_type
---------------  --------------  -------------  ------------
chinook          public          Album          BASE TABLE
chinook          public          Artist         BASE TABLE
chinook          public          Customer       BASE TABLE
chinook          public          Employee       BASE TABLE
chinook          public          Genre          BASE TABLE
chinook          public          Invoice        BASE TABLE
chinook          public          InvoiceLine    BASE TABLE
chinook          public          MediaType      BASE TABLE
chinook          public          Playlist       BASE TABLE
chinook          public          PlaylistTrack  BASE TABLE
chinook          public          Track          BASE TABLE

Finally, try a query that joins data from multiple tables. You can see in this example that we had to use quotes around each name in the database because they are not all in lower case. This is not a big deal, but some people have opinions about it. As John Atten mentioned in his blog post, A More Useful Port of the Chinook Database to PostgreSQL:

"PostgreSQL has its roots in the Unix world. Database object names are case-sensitive and the convention is to use lower-case names and, where needed, separate with underscores. It is possible to use proper-cased object names in Postges by escaping them with double-quotes. However, this makes for some atrocious-looking SQL."

statement = """
SELECT "Album"."Title" AS "Album",
       "Artist"."Name" AS "Artist",
       "Track"."Name" AS "Track",
       "Track"."Composer", 
       "Track"."Milliseconds" AS "Length"
FROM "Album"
JOIN "Track" ON "Album"."AlbumId" = "Track"."AlbumId"
JOIN "Artist" ON "Album"."ArtistId" = "Artist"."ArtistId"
"""

with conn.cursor() as cursor:
    cursor.execute(statement)
    headers = [h[0] for h in cursor.description]
    rows = cursor.fetchmany(5)

print(tabulate(rows, headers))

The results create a table showing information about the media tracks in the database:

Album                                  Artist    Track                                    Composer                                                                  Length
-------------------------------------  --------  ---------------------------------------  ----------------------------------------------------------------------  --------
For Those About To Rock We Salute You  AC/DC     For Those About To Rock (We Salute You)  Angus Young, Malcolm Young, Brian Johnson                                 343719
Balls to the Wall                      Accept    Balls to the Wall                                                                                                  342562
Restless and Wild                      Accept    Fast As a Shark                          F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman                       230619
Restless and Wild                      Accept    Restless and Wild                        F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman    252051
Restless and Wild                      Accept    Princess of the Dawn                     Deaffy & R.A. Smith-Diesel                                                375418   

The SQLAlchemy framework

If you use SQLAlchemy, you abstract away the particularities of SQL statements and do not need to worry about whether you should use quotes or not. The following example connects to the chinook database running on the container and selects a sample of data.

from sqlalchemy.engine import URL
from sqlalchemy import select 
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from sqlalchemy.ext.automap import automap_base
from tabulate import tabulate

url_object = URL.create(
    drivername='postgresql+psycopg2',
    username='postgres',
    password='abcd1234',
    host='localhost',
    port='5432',
    database='chinook'
)

engine = create_engine(url_object)
Base = automap_base()
Base.prepare(autoload_with=engine, schema='public')

Album = Base.classes.Album
Artist = Base.classes.Artist
Track = Base.classes.Track

statement = (select(Album.Title.label("Album"),
            Artist.Name.label("Artist"),
            Track.Name.label("Track"),
            Track.Composer, 
            Track.Milliseconds.label("Length"))
     .join(Track)
     .join(Artist)
     .limit(5)
    )

with Session(engine) as session:
    query = session.execute(statement)
    result = query.fetchall()
    print(tabulate(result, headers=q.keys()))

The program uses SQLAlchemy to generate the same table we previously created using the psycopg2 library.

Album                                  Artist    Track                                    Composer                                                                  Length
-------------------------------------  --------  ---------------------------------------  ----------------------------------------------------------------------  --------
For Those About To Rock We Salute You  AC/DC     For Those About To Rock (We Salute You)  Angus Young, Malcolm Young, Brian Johnson                                 343719
Balls to the Wall                      Accept    Balls to the Wall                                                                                                  342562
Restless and Wild                      Accept    Fast As a Shark                          F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman                       230619
Restless and Wild                      Accept    Restless and Wild                        F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman    252051
Restless and Wild                      Accept    Princess of the Dawn                     Deaffy & R.A. Smith-Diesel                                                375418

Conclusion

You created a Docker image that lets you create containers that run the sample Chinook database on the PostgreSQL database server. You saw how you could integrate the database into your Python programs.

Appendix A: Persistence

Any changes you make to the container's database will persist if you stop and then start the container. But, if you delete the container, its data is also deleted. You can connect specific directories in a container to Docker volumes to save data that can be re-used by a re-created container.

Create a new Docker volume that you will use for storing persistent data. Give it the same name you will use when you create the container, to make it easy to know which volume is used with which container.

$ docker volume create chinook2

List the available Docker volumes:

$ docker volume ls
DRIVER    VOLUME NAME
local     chinook2

Connect volume to container directory

Create a new container, named chinook2 that connects the SQL server's data directory, /var/lib/postgresql/data/, with the Docker volume named chinook2.

$ docker run \
  --detach \
  --name chinook2 \
  --env POSTGRES_PASSWORD=abcd1234 \
  --network host \
  --volume chinook2:/var/lib/postgresql/data \
  postgres-chinook-image

Test data persistence

Create a second Chinook database named chinook2. Use the Docker cp command to copy the original Chinook SQL script to the container. Just throw it in the container's /tmp directory.

$ docker cp Chinook_PostgreSql_utf8.sql chinook1:/tmp

Use the psql utility to create a new database and run the script on that database:

$ docker exec chinook2 createdb \
    --username postgres \
    chinook2
$ docker exec chinook1 psql \
    --username postgres \
    --dbname chinook2 \
    --single-transaction \
    --file /tmp/Chinook_PostgreSql_utf8.sql

List the available databases and see that the chinook2 database was added.

$ docker exec chinook4 psql \
    --username postgres \
    --list
                                                List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    | ICU Locale | Locale Provider |   Access privileges   
-----------+----------+----------+------------+------------+------------+-----------------+-----------------------
 chinook   | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | 
 chinook2  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | 
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | =c/postgres          +           |          |          |            |            |            |                 | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | =c/postgres          +           |          |          |            |            |            |                 | postgres=CTc/postgres
(5 rows)

You can add more databases and run additional scripts, as you wish.

Stop the container and delete it:

$ docker stop chinook2
$ docker rm chinook2

Create another new container from the original image but connect its data directory to the volume you previously created for the chinook2 container.

$ docker run \
  --detach \
  --name chinook_test \
  --env POSTGRES_PASSWORD=abcd1234 \
  --network host \
  --volume chinook2:/var/lib/postgresql/data \
  postgres-chinook-image

According to the Postgres container documentation, the install script we copied to the image's docker-entrypoint-initdb.d directory will only run if the data directory is empty. This container's data directory is mapped to the Docker volume that already contains several databases so the install script will not run. The container will use the databases in the attached volume.

List the available databases. You can see that the chinook2 database you previously created exists on the new container because it was saved in the volume that was attached to the new container.

$ docker exec chinook_test psql \
    --username postgres \
    --list

                                                List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    | ICU Locale | Locale Provider |   Access privileges   
-----------+----------+----------+------------+------------+------------+-----------------+-----------------------
 chinook   | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            |

 chinook2  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            |

 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            |

 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | =c/postgres          +           |          |          |            |            |            |                 | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | =c/postgres          +           |          |          |            |            |            |                 | postgres=CTc/postgres
(5 rows)

Look at tables in the second database:

$ docker exec chinook_test psql \
    --username postgres \
    --dbname chinook2 \
    --command " \
        SELECT schemaname, tablename \
        FROM pg_catalog.pg_tables \
        WHERE schemaname != 'pg_catalog' 
          AND schemaname != 'information_schema';"

This command outputs the tables in the chinook2 database:

 schemaname |   tablename   
------------+---------------
 public     | Artist
 public     | Album
 public     | Employee
 public     | Customer
 public     | Invoice
 public     | InvoiceLine
 public     | Track
 public     | Playlist
 public     | PlaylistTrack
 public     | Genre
 public     | MediaType
(11 rows)

You can now be confident that the data was saved in a persistent volume.

Clean up

You may stop and delete the container and, if you wish, delete the volume.

$ docker stop chinook_test
$ docker container prune
$ docker volume prune

  1. You can also run this directly from the host using the command: docker exec chinook1 psql -U postgres -d chinook -c 'select * from "Employee" limit 2'