Containerizing our components#
Having developed all the services in our pipeline, including the gRPC servers and optional web applications, we are now prepared to containerize these components. Containerization involves creating Dockerfiles for each component to encapsulate and deploy them effectively.
Writing the Dockerfiles#
Dockerfiles for each component share a common structure. We will focus on the Dockerfile for the data component, which illustrates the general approach applicable to other components. The Dockerfile should include the following six elements:
Define the base image.
Set the working directory.
Copy the requirements.txt file into the container.
Install the dependencies.
Copy the application code into the container.
Define the command for running the application.
Each Dockerfile needs to ensure that all necessary files and dependencies are present within the container. By organizing our code into separate folders for each component, we streamline this process by copying the entire folder contents into the container.
Define the base image
In a Dockerfile, the base image is the image from which your Docker image is built. This base image is specified using the FROM instruction at the beginning of the Dockerfile and serves as the starting point for building your custom image. The base image typically includes a minimal operating system and any necessary pre-installed software, libraries, or dependencies that your application needs to run. In our case we use:
FROM ubuntu:22.04
Set the working directory
Setting the working directory in a Dockerfile using the WORKDIR instruction specifies the directory within the Docker container where commands will be executed. It essentially sets the context for any subsequent instructions in the Dockerfile, such as COPY, RUN, and CMD. By setting a working directory, you ensure that the application files and operations are organized within a specific path inside the container. This helps maintain a clean and predictable environment for running your application.
WORKDIR /app
Update the package list and install required packages
Before installing Python dependencies, it’s necessary to ensure that the container has access to the latest package lists and necessary tools. The RUN instruction does this by performing the following:
RUN apt-get update -y && \ apt-get install -y python3-pip python3-dev && \ pip3 install --upgrade pip
apt-get update -y: This command updates the package list from the repositories, ensuring that the container has access to the most recent versions of packages available in the Ubuntu repositories. The -y flag automatically answers ‘yes’ to any prompts, allowing the command to run non-interactively.
apt-get install -y python3-pip python3-dev: This command installs essential packages needed for the Python environment:
python3-pip: The package manager for Python, which allows for easy installation of Python packages.
python3-dev: Contains the header files and a static library for Python, which are often required to compile Python packages from source.
pip3 install –upgrade pip: This command ensures that pip is updated to the latest version, which helps avoid issues related to outdated package management.
This step is crucial for preparing the environment in the container before the Python dependencies are installed.
copy the requirements.txt file into the container
Copying the requirements.txt file into the Docker image is essential for ensuring that all necessary Python dependencies are installed in the container. This practice enables the pip install -r requirements.txt command to install the specified libraries and packages, creating a consistent environment for your application.
COPY ./requirements.txt .
install the dependencies
This is done for the same reasons listed in the previous step.
pip3 install -r requirements.txt
copy the application code into the container
Copying the code into the container using the command COPY . . ensures that your application code is included in the Docker image, allowing the container to execute the application. This command copies all files from the current directory on the host machine to the working directory inside the container. The dockerfile will be placed inside of the data folder, and therefor the first “.” corresponds to the data folder and all it’s contents. The second “.” referes to the working directory inside the container.
COPY . .
Expose necessary ports The EXPOSE instruction informs Docker that the container listens on specific network ports at runtime. This command specifies that the container will listen on ports 8061 and 8062.
EXPOSE 8061 8062
define the command for running the application
We need to specify the default command to run when the container starts. This command is essential for defining the container’s primary process, ensuring that when the container is initiated, it automatically runs your application.
CMD ["python", "data_service_server.py"]
Combining all the steps, the Dockerfile for the data component is:
FROM ubuntu:22.04
# Set the working directory inside the container
WORKDIR /app
# Update the package list and install required packages
RUN apt-get update -y && \
apt-get install -y python3-pip python3-dev && \
pip3 install --upgrade pip
# Copy requirements.txt first to leverage Docker cache
COPY ./requirements.txt .
# Install Python dependencies
RUN pip3 install -r requirements.txt
# Copy the rest of the application code
COPY . .
# Expose the necessary ports
EXPOSE 8061 8062
# Command to run the server
CMD ["python3", "data_service_server.py"]
For other components, the Dockerfiles will largely follow this structure. The primary modification will be in the command specified in the CMD instruction to match the entry point script for each respective component.
Testing the Container#
Once you have written your Dockerfile, it is crucial to test the container to ensure that it functions as expected. Testing the container involves running it, verifying that the server and web application start correctly, and ensuring that all necessary services are accessible. The following steps outline the process for testing your containerized application.
Before proceeding, ensure that the Docker daemon is running on your system. The Docker daemon must be active for Docker commands to function properly. Below are instructions for verifying and starting the Docker daemon on Windows, Linux, and macOS.
Ensuring Docker Daemon is Running#
Windows#
Verify Docker Daemon: Check if Docker is running by looking for the Docker icon in the system tray. If the icon is present, Docker is likely running.
Start Docker Daemon:
Open Docker Desktop from the Start Menu. If Docker Desktop is not running, click on its icon to start it. Docker Desktop will handle the activation of the Docker daemon.
Wait for Docker Desktop to indicate that it is running and ready.
Linux#
Verify Docker Daemon: Use the following command to check if the Docker daemon is running:
sudo systemctl status docker
If the daemon is active, you will see “active (running)” in the output.
Start Docker Daemon:
If Docker is not running, start it using:
sudo systemctl start docker
To enable Docker to start automatically at boot:
sudo systemctl enable docker
macOS#
Verify Docker Daemon: Check for the Docker icon in the menu bar at the top of your screen. If the icon is present, Docker is likely running.
Start Docker Daemon:
Open Docker Desktop from the Applications folder or by using Spotlight search.
If Docker Desktop is not running, click on its icon to start it. Docker Desktop will automatically manage the Docker daemon. Wait for Docker Desktop to indicate that it is operational.
Building and running your container#
Build the Docker Image Before testing, you need to build the Docker image from the Dockerfile. Navigate to the directory containing your Dockerfile and execute the following command:
docker build -t <repository>/<image-name>:<tag> .
-t
/ : : : This is the name of your Docker repository. If you plan to push the image to a registry like Docker Hub or a private registry, this is where you would specify the repository name. If you’re building the image locally and don’t plan to push it, you can omit the repository part. Docker Hub Username as Repository: Since the goal is to push the image to Docker Hub, it is a best practice to use your Docker Hub username as the
name. This is because Docker Hub organizes images under usernames, and other users can pull images using the username/image-name:tag format.
: This is the name you assign to your Docker image. Choose a meaningful name that reflects the purpose of the image. : Tags are used to version your Docker image. Commonly, tags like latest, v1.0, or stable are used to indicate the version of the image. For example, you might tag the image as v1.0 if it is the first stable release. It is a good idea to define a specific tag number instead of using “latest”. Later, if you need to update the code and redeploy, specify a new tag to ensure the updated code gets pulled when deploying.
.: The dot at the end specifies the build context, which is typically the current directory. It tells Docker to look for the Dockerfile and the necessary application files in this directory.
After the build completes successfully, you can list your Docker images using the following command:
docker images
This will display a list of images, including the one you just built.
Run the Docker Container Once the image is built, you can run a container from this image using the following command:
docker run -d -p 8061:8061 -p 8062:8062 --name your-container-name <repository>/<image-name>:<tag>
d: Runs the container in detached mode, meaning it will run in the background.
p 8061:8061 -p 8062:8062: Maps the container ports 8061 and 8062 to the ports 8061 and 8062 of the host machine, allowing access to services running inside the container.
–name your-container-name: Assigns a name to the container for easier management. Replace your-container-name with a descriptive name for the container.
/ : : Refers to the image you just built. Ensure you use the correct repository, image name, and tag. To check that both the server and the applications got started successfully you can check the logs.
Checking Logs To check that the server and application got started successfully or to diagnose any potential issues, you can inspect the logs of the container:
docker logs your-container-name
This command displays the logs generated by the container, providing insight into the application’s and server’s runtime behavior.
Accessing the Services You can use curl, a web browser, or a tool like Postman to interact with the web application running inside the container. For example, since the application is a web service running on port 8061, you can test it by visiting http://127.0.0.1:8062/
To test the server you can run the client defined in the previous chapter using the command:
python data_client.py
Stop and Remove the Container Once you have completed testing, you should stop and remove the container to free up resources:
docker stop your-container-name docker rm your-container-name
docker stop: Stops the running container. docker rm: Removes the container from the list of containers. If you no longer need the Docker image you built, you can also remove it:
docker rmi <repository>/<image-name>:<tag>
Pushing to dockerhub#
With the data component successfully containerized and tested, it can be pushed to dockerhub.
Certainly! Here is a paragraph explaining how to push a Docker image to Docker Hub for the first time:
To push a Docker image to Docker Hub, begin by ensuring that you are logged into your Docker Hub account from the command line. If you do not have an account, make sure to create one firts. Once you have an account, use the command docker login
and enter your Docker Hub credentials when prompted. Once authenticated, use the docker push
command to upload the image, as in docker push myusername/my_image:tag
. This will push your Docker image to your specified Docker Hub repository, making it accessible for deployment or sharing with others. Ensure that you replace myusername and my_image with your Docker Hub username and the appropriate image name, respectively.
Conclusion#
With the data component successfully containerized, tested and pushed to docker hub, the next step is to repeat this process for the remaining components of the pipeline. You should containerize the other two components, ensuring that each is correctly configured, and then thoroughly test them in a similar manner. At this stage, with all necessary parts containerized, tested and pushed, you are ready to move on to the final phase: deployment.