Cloud Resume Challenge Part 3/4 - Back-End

Table of Contents

Introduction #

This post is the third part in as series of my story on the Cloud Resume Challenge. Here are the links to the first and second part.

System Diagram #

Below is the system diagram for the whole system. In this post we are looking at the Back-end part specifically.

Implementation Details #

Database (DynamoDB) #

Taking crash course #

I had experience working with relational database, however non-relational database like MongoDB or DynamoDB is a something new for me, so I took a crash course from YouTube to get rough idea about DynamoDB. Maybe some of you are already know about DynamoDB. But, just to make sure we are on the same page, I would like to share some interesting points I learned from the crash course.

DynamoDB table is a key-value database. So, a table is a collection of items, and an item is a collection of attributekey-value pairs.
Partition key = a key that will be spread along the DB partition (server). Choose a key that will be accessed evacross partition. Doesn’t have to be unique, but can be unique only if we plan to not use sort key.
Sort Key = a key that can be sorted/filtered. Sorting/filtering based on this key will achieve optimum performance forPaired together with partition key, the combination will become primary key (must be globally unique).

Global Search Index (GSI) = a column that will be indexed, so we can search efficiently. Consider the example table below. If we want to retrieve all items with OriginCountry = USA, we can filter by that column. In this case, we set the OriginCountry as GSI.

AccountId	CreationDate	OriginCountry	Details
1	2022-01-03	USA	( … JSON … )
2	2022-03-30	Canada	( … JSON … )
3	2022-07-23	USA	( … JSON … )
4	2022-10-13	Germany	( … JSON … )

Attribute = any other columns that contain information. This can even contain JSON data (with maximum data size of course).

Database schema #

After learning about the basics of DynamoDB, I need to design the database schema. The objective of having a DynamoDB here is to track the number of visitors for each web page in my website. Here are key questions I should answer when designing the DynamoDB schema:

What attributes should I prepare for the DynamoDB?
Which attributes should become the partition key?
What kind of form should the partition key be? Numerical ID (1, 2, 3, …) ? Universally unique identifier (UUID) like “eed78b78-29f9-4893-a432-4c4f50b0d1c4” ? Or just the page title string (home, contact, etc.) ?
Do I need sort key or secondary index (like GSI)?

After thinking and doing a little research, here are my answers:

At least the database should have “page-title” and “number-of-visits” attributes.
I would like to have “UUID” as partition key. In the end, my cloud resume will only have a single page. To track visitor count for 1 page using a UUID is overkill in my opinion. Nevertheless, I’ve never worked with UUID before and interested in learning more about it (just want to be fancy), so I used it.
Sort key and secondary index are not necessary this single-page website.

To make sure that the UUID and page-title attribute are unique, following this blog post by AWS, I made the schema of my table to be like this. In the table below, the “howtobuild” and “about” page name are presented for illustrative purpose. In the real implementation, my cloud resume only have one page name “home”.

pkey_uuid	page_name	visit_count
6632d5b4-5655-4c48-b7b6-071d5823c888	home	10
page_name#home
8ec436a8-97e6-4e72-aec2-b47668e96a94	howtobuild	2
page_name#howtobuild
eed78b78-29f9-4893-a432-4c4f50b0d1c4	about	0
page_name#about

Python #

After finished preparing the DynamoDB database, now it is time to work on the logic of incrementing the visit counts (visit_count). In this stage, I worked mainly with AWS Lambda because it is the platform where the Python code is going to be executed. AWS Lambda is a function as a service - we can invoke python function on it without needing to provision any Linux or Windows Server.

About Python code development in Lambda #

In general, the python code accepts request payload (in JSON format) from AWS API Gateway, parses the request, interacts with DynamoDB (read or update the visit count), and return the visitor count to the API GW again.

By default, Lambda uses a function called “lambda_handler” that acts as the main handler function to process our application logic. The lambda_handler function has two arguments: event and context. The event object contains the data we need to process, and the context contains some metadata about the function invocation (function version, function ID, AWS request ID, CloudWatch log information , etc.). In my cloud resume application, I need to implement the logic of retrieving DynamoDB visitor count inside this function. Regarding the handler function argument, the event argument contains parameter passed from API Gateway.

How I implemented the python code #

You can see the full code of the program I created in the Github repository.

Here are some key features of the python code:

Use boto3 library to interact with the DynamoDB
Use aws_lambda_powertools library to do the testing. This library allow us to mock AWS tools/services.
I prepared two helper functions the will be called by the lambda_handler main function: getVisitorsCount and addOneVisitorCount. The getVisitorsCount only returns the number of cloud resume visitor while the addOneVisitorCount returns the number of cloud resume visitor that have been incremented by 1. I use the getVisitorsCount mainly for development/debugging purpose. In production, addOneVisitorCount is used.
I diligently catch code errors. When developing this code, I realized even more the importance of catching error cleanly. With a proper error catching, it becomes easier to debug the code. Before catching the errors diligently, I got a lot of too general error messages like {“message”: Internal Server Error”} or {”message”: “Not Found”}.
For privacy reason, I masked the DynamoDB table name by storing it as Lambda’s environment variable.

For your convenience, below is the lambda_handler function code that I implemented.

def lambda_handler(event: APIGatewayProxyEvent, context: LambdaContext) -> Dict[str, Any]:
    """
    Lambda Entry Point
    """
    # Use the Global variables to optimize AWS resource connections
    global _LAMBDA_DYNAMODB_RESOURCE

    try:
        # parse event object from API Gateway
        routeKey = event.get('routeKey', '{}')
        functionName = event.get('queryStringParameters', {"func":"{}"}).get('func','{}')
        pageId = event.get('pathParameters', '{}').get('page-id','{}')

        # initialize the AWS DynamoDB resource
        dynamodb_resource_class = LambdaDynamoDBClass(_LAMBDA_DYNAMODB_RESOURCE)

        # execute the API depending on the function name
        if (routeKey == 'GET /counts/{page-id}') and (functionName == "getVisitorCount"):
            return getVisitorsCount(dynamo_db=dynamodb_resource_class,
                                    page_id=pageId)
        elif (routeKey == 'GET /counts/{page-id}') and (functionName == "addOneVisitorCount"):
            return addOneVisitorCount(  dynamo_db=dynamodb_resource_class,
                                        page_id=pageId)
        else:
            raise ApiRequestNotFoundError("Requested path or parameter not found")
    
    except ApiRequestNotFoundError as api_error:
        body = "Not Found: " + api_error.args[0]
        status_code = 404
        return {"statusCode": status_code, "body" : body }

API #

After I made sure that the Python code is working (I could invoke it from the Lambda console), now it is time to make the Lambda to be accessible from any device. To do this, I exposed the Lambda function via API Gateway. To put it simply, if I make a request to my API at https://{some-domain.com}/count/{page-id-uuid} , the API Gateway forwards that request to the Lambda function.

When working with AWS API Gateway, there are several parameters we need to decide.

API type #

AWS API Gateway support several types of API Gateway, for instance HTTP API, REST API, or WebSocket API. In this cloud resume project, the WebSocket API is not relevant, and I can implement the API with either HTTP API or REST API. These two API types are similar but have different purpose, AWS docs explains about their difference. In the end, I chose HTTP API because it is simpler and cheaper.

HTTP endpoint #

I made the API Gateway for this cloud resume to have this endpoint: HTTP GET /counts/{page-id}. Just in case I decide to expand this cloud resume by creating more pages, I can use the same API Gateway to invoke Lambda function for other page ids. I only made 1 API endpoint. I expect calling this single API would not only getting the the visitor number from DynamoDB, but also increase the count at the same time (i.e. by calling the addOneVisitorCount function in Lambda).

API Gateway stage variable #

AWS API Gateway has a feature called “stage variable”. It is a Name-value pairs that can be used to configure different settings for each stage of API publication. To be specific, we can target different backends instance/endpoint by using the same API Gateway deployment. For example, we can have API URLs like this: .some-domain.com to target Test lambda function, while .some-domain.com to hit another lambda function.

I didn’t make any additional API Gateway stage variable. I just use the default one (called $default). The reason is, using Terraform I am going to make a completely separate infrastructure for development and production (I will have two API Gateway, one for development and another one for production). I will discuss about Terraform and Infrastructure as Code in my next blog post.

In the beginning, when testing the integration between API Gateway and Lambda I used curl. Using curl is doable in the beginning, however writing a long URL is getting more cumbersome the more I spent time testing the API. Fortunately, a cool tool called Postman came to help. It is a free tool to help API development. With this tool I can save the API endpoint and some parameters in the Postman system, and with a single click I can test the API integration. Although not necessary, I would like to suggest this tool to other cloud resume challengers, just to expand your toolkit. It is not too hard to learn.

After making sure that the API Gateway can invoke the Python Lambda function, the last step is to call this API endpoint from the JavaScript of the cloud resume front end. If you do the same way as I did, at this point the visitor number is displayed in the resume web page. Every time time we refresh the cloud resume page (by pressing F5 in the browser), the visitor count displayed should increase by 1. Congratulation, our cloud resume is working! From now on we only need to make several enhancement to make our project more robust and can be managed more easily.

Conclusion #

To close this post, here are some key points that I made when implementing the cloud resume back-end.

Database (DynamoDB): use page-id UUID as partition key (primary key).
Python: in Lambda implement a function that not only retrieve the visitor count from DynamoDB, but also increase the count by 1.
API: use API Gateway “HTTP API” to integrate it with the Lambda function. I set this API endpoint: HTTP GET /counts/{page-id}to invoke the Lambda function.

Up to this stage, we already have a working cloud resume. Our user can visit the cloud resume website from the domain we chose, and every time a visitor load the page, the visitor count is increased in the database and displayed on the web page. Everything is working, but there are several aspects where we can make our cloud resume to be better. In the next blog post I will discuss about these aspects, the DevOps-related stuff, e.g. source control, CI/CD pipeline, software testing, and Infrastructure as Code.

I hope you learned something again in this post, and see you in my next post!