Managing Data in Containers

Arjun Vaidy
Nerd For Tech
Published in
3 min readMay 5, 2022

--

We have code and its environment(e.g. node dependencies) in our local machine. We create a docker file and instruct them to create an image. Images are read-only and take the snapshot of code at that particular time it was created. Therefore we need containers to run them. Always remember the fact that containers add an extra layer on top of them and not copy the code from the image it was created.

  1. Once we create Images and Containers, they will be isolated from the local machine
  2. Images are read-only — we can’t rewrite the code in Images
  3. Containers are a read-write layer on top of images

But wait! Where does the writing layer come from in the container?

The answer is from the code itself. Say a code like a login form and registration form. When we register to the website by giving details, it will then go to the database and be saved. Then it will create a custom-made profile URL like Facebook registration.

Here the problem evolves from permanent data because till now we have seen ways to store temporary data that comes in handy and default when the container is created. But with the case of permanent data — we need another tool that fills the gap of losing the data.

This tool has to find a way to connect to the local machine/host because that is permanent. These tools in the docker arena are called Volumes and Bind Mounts.

Volumes are folder in the host machine’s hard drive that is mapped to folders inside the docker containers.

Since it is a folder in the host machine, it is permanent. Connection with containers enables the communication between them

The problem we encountered is called Data Persistence. There are some data that need to be available even after the container is removed. This is done by making the connection with the local machine folder.

When you create an Image, give instruction VOLUME [“any docker folder”]. This instruction will be connected to the local machine folder that is known only to Docker. Therefore Volumes are managed fully by docker and isolated from all local machine processes.

When you run a container based on the image(that has VOLUME instruction), volume is created and the name is randomly given by the docker itself — This is called Anonymous Volume.

Since anonymous volumes are created automatically created by docker, it is tied to a specific container and removed automatically as soon as the specific container is removed ( — rm command)

As we are interested in data persistence across all containers, we need to find a way not to remove volumes even when the container is removed. This is done with the help of Named Volumes. Therefore while creating a container add a command. The volume will survive even after the container is removed

Note: Both Named and Anonymous volumes have use cases

Since Volumes are wholly controlled by docker itself, our local changes are not reflected in the docker file system — any way that is the isolation we wanted.

But during development, we can’t create new images every time we change the source code. This is done with the help of Bind Mounts. Here instead of volume name give an absolute path in the instruction -

'-v [local machine folder]:['container folder that needs to be connected to local machine']

The bind mounts are not fully controlled by docker itself because we attach the local machine folder to the container deliberately.

Source: Docker site

Recap:

Containers are read-write — they add a thin read-write layer on top of the image

Container data can’t persist after the container is removed — Volume is the solution

Container can’t interact with the host file system — Bind mount is the solution

Originally published at https://www.pansofarjun.com on May 5, 2022.

--

--

Arjun Vaidy
Nerd For Tech

Founder of a startup. I explain things through first principles and intuitive mental models