Configuring Components for gRPC Communication

Configuring Components for gRPC Communication#

With a foundational understanding of gRPC and Protocol Buffers, we can now proceed to configure the previously defined services to leverage gRPC for inter-component communication. It is important to note that while the AI on Demand platform will manage the communication between components during deployment, our responsibility lies in creating gRPC servers for each component and ensuring that the necessary functions are correctly implemented and operational.

The following steps outline the process of preparing the services for gRPC communication:

Define the Service
Write the Protofile
Generate gRPC Code
Implement the Server Using the Generated gRPC Code
Create a Client Using the Generated gRPC Code to Test the Server

Since the services were defined in the previous chapter, we can proceed directly to writing the protofile. The service code is located in the 4. Communication using gRPC and Protocol Buffers\example directory of the GitHub repository. While this guide will demonstrate the process for the data component, the steps apply to all components once the underlying concepts are understood.

Creating the Protofile#

When defining the protofile, consider the data that needs to be exchanged between the components. For instance, in our pipeline, the first component must accept an empty message as input, a requirement when using the AI on Demand platform. Therefore, we need to define an empty message. Additionally, the component must return six distinct lists: the training and testing datasets for variables x, y, and dates. Consequently, we need to define a message type that can encapsulate these lists. Below is an example of how these messages might be defined:

syntax = "proto3";

message Empty {}

message CleanedData {
    repeated double x_train = 1;
    repeated double y_train = 2;
    repeated double x_test = 3;
    repeated double y_test = 4;
    repeated string dates_train = 5;
    repeated string dates_test = 6;
}

In this example, the repeated keyword is used to denote that a field contains a list. Once the messages are defined, we can proceed to define the service provided by our gRPC server. This involves specifying which RPC methods will be available on the server. In this case, we have only one function, clean_data, so we define our data service with a single RPC method, CleanData, which takes an Empty message as input and returns a CleanedData message. The following example illustrates this:

service DataService {
    rpc CleanData (Empty) returns (CleanedData);
}

The protofile should be saved under the name model.proto, as required by the AI on Demand platform. It is also essential to avoid defining a package within the protofile, as doing so may lead to errors during deployment. With the protofile written and saved, we can now proceed to generate the gRPC code.

Generating the gRPC Code#

With the protofile prepared, the next step is to generate the gRPC code. Ensure that the necessary gRPC tools are installed:

pip install grpcio-tools

You can generate the gRPC code from the protofile using the following command, executed in the directory containing the protofile:

python -m grpc_tools.protoc -I./ --python_out=. --grpc_python_out=. model.proto

This command will generate two files: model_pb2.py and model_pb2_grpc.py. The first file contains the message classes defined in the protofile, such as Empty and CleanedData, accessible via model_pb2.Empty and model_pb2.CleanedData. The second file contains the necessary code for creating the client and server for the microservice, which we will address in the next step.

Creating the Server#

Next, we will create the server for the component, leveraging the code generated in the gRPC files. The model_pb2_grpc.py file includes class definitions for both a servicer and a service, which are essential for server creation.

The servicer is an abstract class that you implement to handle the server-side logic of your gRPC service. This is where you define the behavior of each RPC method by implementing the business logic described in your .proto file. To implement the server:

Implement the Servicer: Create a subclass of the generated DataServiceServicer and implement the methods.
Start the Server: Use the add_DataServiceServicer_to_server function to attach your servicer to the server and start it.

Since the clean_data function has already been implemented in the service file, your task is to import it and, if necessary, add error handling to implement the CleanData RPC function. For simplicity, we will assume that the CSV file is located in the server’s directory and will access it directly. Additionally, ensure that the return type of the RPC method matches the definition in the protofile. Specifically, the CleanedData message, which is returned by the CleanData RPC, must conform to the structure defined earlier. Below is an example of how to create the subclass for the servicer:

from concurrent import futures
import grpc
import model_pb2_grpc
import model_pb2
from data_service import clean_data
import os
import logging

logging.basicConfig(
    level=logging.INFO,
    format='[%(asctime)s] %(levelname)s: %(message)s',
    handlers=[
        logging.StreamHandler()  # Output to the console
    ]
)

class DataServiceServicer(model_pb2_grpc.DataServiceServicer):
    def __init__(self):
        self.dataset_filepath = 'uploaded_file.csv'

    def CleanData(self, request, context):
        logging.info("Cleaning data...")
        try:

            if not os.path.isfile(self.dataset_filepath):
                context.set_code(grpc.StatusCode.NOT_FOUND)
                context.set_details(f"Dataset file not found")
                return model_pb2.CleanedData()
            
            # Clean data
            x_train, x_test, y_train, y_test, dates_train, dates_test = clean_data(self.dataset_filepath)

            if len(x_train) != len(y_train):
                raise ValueError("x_train and y_train have different lengths")
            
            response = model_pb2.CleanedData(
                x_train=x_train,
                y_train=y_train,
                x_test=x_test,
                y_test=y_test,
                dates_train=dates_train,
                dates_test=dates_test
            )
            
            # Return cleaned data along with the result of training
            logging.info("Data cleaned successfully.")
            return response
        
        except Exception as e:
            context.set_code(grpc.StatusCode.INTERNAL)
            context.set_details(f"Internal error: {str(e)}")
            return model_pb2.CleanedData()

The function first verifies the availability of the dataset and subsequently utilizes it to invoke the previously defined clean_data function. Upon execution, the function returns the necessary values by the CleanedData message type specified in the protofile. Additionally, logging has been incorporated to facilitate debugging during testing.

The next step in server setup is to initiate it. This involves using the generated add_DataServiceServicer_to_server function. Begin by creating a gRPC server with the following line of code:

server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))

Here, futures.ThreadPoolExecutor(max_workers=10) configures a thread pool executor to manage concurrent RPCs. This setup allows the server to handle up to 10 concurrent requests (RPC calls) in parallel by managing a pool of threads for asynchronous task execution.

Once the server is created, add the servicer with the following command:

data_pb2_grpc.add_DataServiceServicer_to_server(DataServiceServicer(), server)

Finally, define a port, start the server, and ensure its continuous operation. The AI on Demand platform mandates that servers operate on port 8061. This configuration is achieved with:

server.add_insecure_port('[::]:8061')
server.start()
server.wait_for_termination()

You now possess all the essential components for your server. For the complete server code, refer to the data_service_server.py file located in the directory 4. Communication using gRPC and Protocol Buffers\example\data in the GitHub repository.

Creating the Client for Testing#

While creating a client is not strictly necessary for deployment, it is highly recommended to test the server to ensure proper functionality.

Creating a gRPC client involves establishing a communication channel, creating a stub to interact with the server, making a request, and handling the response. The stub, generated from the protofile, serves as an intermediary between the client and server. It provides methods corresponding to the RPCs defined in the .proto file, enabling the client to invoke these methods as if they were local functions, despite being executed on a remote server.

To set up the client:

Create a Channel: Establish a communication path to the server by specifying its address (localhost:8061 in this instance):
```
with grpc.insecure_channel('localhost:8061') as channel:
```
The with statement ensures that the channel is properly closed upon completion of the operation.
Define the Stub: Use the channel to instantiate a stub:
```
stub = model_pb2_grpc.DataServiceStub(channel)
```

Send a Request: Define an empty message and invoke the CleanData method using the stub:

empty_message = model_pb2.Empty()
            
# Call the CleanData method
response = stub.CleanData(empty_message)

Process the Response: Evaluate the server’s response and handle it as needed. This may involve printing the response data or using it for further computations. To test the next component, save the results as follows:

if response.x_train and response.x_test and response.y_train and response.y_test and response.dates_train and response.dates_test:
                print("x_trian:", response.x_train)
                print("x_test:", response.x_test)
                print("y_train:", response.y_train)
                print("y_test:", response.y_test)
                print("Dates Train:", response.dates_train)
                print("Dates Test:", response.dates_test)

                # save the data to a file for later usage
                with open('cleaned_data.pkl', 'wb') as f:
pickle.dump(response, f)
logging.info("Cleaned data saved to cleaned_data.pkl.")

These steps complete the client setup. For the full client code, refer to the data_client.py file in the same directory as the data server. To test the server, execute python data_service_server.py in one terminal and python data_client.py in another. The cleaned data should be printed, and a new .pkl file containing the results will be generated in the same directory as the server and client.

Defining Additional Servers#

You may proceed by similarly creating the other servers:

Define the service
Write the protofile
Generate gRPC code
Create the server
Create and test the client

The gRPC code for training and testing, including all corresponding clients, has already been implemented and can be found in the repository under the directory TAIS-educational-material-public\4. communication using grpc and protobuffers\example.

Conclusion#

You are now acquainted with the process of creating a gRPC server and testing it using a client. The next phase involves developing web applications for the data and testing components to enable users to upload a CSV file and view the test results. To fully grasp the forthcoming chapter, ensure you review the code from this chapter, particularly the testing server, as it will be central to the next phase.