Mono repo – Why we should go for it

Image for post

In revision control systems, a monorepo (a syllabic abbreviation of a monolithic repository) is a software development strategy where code for many projects are stored in the same repository. Splitting up large codebases into separate independently versioned packages is extremely useful for code sharing. However, making changes across many repositories is messy and difficult to track, and testing across repositories gets complicated really fast. Several companies have embraced this strategy like Google, Facebook, Microsoft, Uber, Airbnb and Twitter.

Why use a Monorepo?

  • Easily refactor global features with atomic commits. Instead of doing a pull request for each repo, figuring out in which order to build your changes, you just need to make an atomic pull request which will contain all commits related to the feature that you are working against.
  • Simplified package publishing. If you plan to implement a new feature inside a package that is dependent on another package with shared code, you can do it with a single command. It is a function that needs some additional configurations, which will be later discussed in a tooling review part of this article. Currently, there is a rich selection of tools, including Lerna and Yarn Workspaces.
  • Simplified dependency management — In a multiple repository environment where multiple projects depend on a third-party dependency, that dependency might be downloaded or built multiple times. In a monorepo the build can be easily optimized, as referenced dependencies all exist in the same codebase.
  • Re-use code with shared packages while still keeping them isolated. Monorepo allows you to reuse your packages from other packages while keeping them isolated from one another. You can use a reference to the remote package and consume them via a single entry point. To use the local version, you are able to use local symlinks. This feature can be implemented via bash scripts or by introducing some additional tools like Lerna or Yarn.

With Lerna, we now manage a single repository for all of our packages, with a directory structure that looks like this:

mylerna_repo/
- node_modules
- packages
- client
package.json
- server
package.json
- docs
package.json
lerna.json
package.json

Tool Review

The set of tools for managing monorepos is constantly growing, and currently, it’s really easy to get lost in all of the variety of building systems for monorepos. You can always be aware of the popular solutions by using this repo. But for now, let’s get a quick look at the tools that are heavily used nowadays with JavaScript:

  • Yarn is a JavaScript dependency management tool that supports monorepos through workspaces no-hoist.
  • Lerna is a tool for managing JavaScript projects with multiple packages, built on Yarn.

Yarn

Yarn is a dependency manager for NPM packages, which was not initially built to support monorepos. But in version 1.0, Yarn developers released a feature called Workspaces. At release time, it wasn’t that stable, but after a while, it became usable for production projects.

Workspace is basically a package, which has its own package.json and can have some specific build rules (for example, a separate tsconfig.json if you use TypeScript in your projects.). You actually can somehow manage without Yarn Workspaces using bash and have the exact same setup, but this tool helps to ease the process of installation and updating dependencies per package.

At a glance, Yarn with its workspaces provides the following useful features:

  1. Single node_modules folder in the root for all packages. For example, if you have packages/package_a and packages/package_b—with their own package.json—all dependencies will be installed only in the root. That is one of the differences between how Yarn and Lerna work.
  2. Dependency symlinking to allow local package development.
  3. Single lockfile for all dependencies.
  4. Focused dependency update in case if you want to re-install dependencies for only one package. This can be done using the -focus flag.
  5. Integration with Lerna. You can easily make Yarn handle all the installation/symlinking and let Lerna take care of publishing and version control. This is the most popular setup so far since it requires less effort and is easy to work with.

Lerna

This tool really helps while dealing with semantic versions, setting up building workflow, pushing your packages, etc. The main idea behind Lerna is that your project has a packages folder, which contains all of your isolated code parts. And besides packages, you have the main app, which for example can live in the src folder. Almost all operations in Lerna work via a simple rule — you iterate through all of your packages, and do some actions over them, e.g., increase package version, update dependency of all packages, build all packages, etc.

With Lerna, you have two options on how to use your packages:

  1. Without pushing them to remote (NPM)
  2. Pushing your packages to remote

While using the first approach, you are able to use local references for your packages and basically don’t really care about symlinks to resolve them.

But if you are using the second approach, you are forced to import your packages from remote. (e.g., import { something } from @name/packagename;), which means that you will always get the remote version of your package. For local development, you will have to create symlinks in the root of your folder to make the bundler resolve local packages instead of using those that are inside your node_modules/. That’s why, before launching Webpack or your favourite bundler, you will have to launch lerna bootstrap, which will automatically link all packages.

Conclusion

Going “monorepo” today usually means turning a repository into a multi-package repository from which multiple packages can be published. This repository is part of a multi-repo architecture and lives in its ecosystem.

Tools like Bit (which was built for code-sharing in a multi-repo codebase), Lerna and Yarn workspaces help to optimize this workflow, and breed code-sharing for faster development and simplified maintenance.

Choosing the right tooling means understanding what are you going to build, why are you building it, and how do you expect other people to use it. Answering these questions can help you make good choices from the get-go, which will make your life much easier down the road.

Don’t forget: sharing code is about tools and technology, but also about people and communication. The right tools can help you share and communicate, but won’t replace team-works and collaboration.

Advertisement

BROTLI ON APACHE / NGINX & Its advantages over gzip & BROTLI ON AKAMAI

When your client sends a request to the server it will include a header saying which compression formats it will accept,  As you can see it says it will accept gzip, deflate or br compression formats. The server will respond and if available will compress the result in a supported format.  Here it supports gzip. Brotli is a new open-source compression format from google that improves on the performance of gzip in many cases. We only care about the HTTP compression.

Warning

Brotli only works on an https connection. Which is a good thing because we all want to encrypt the web, right? Installing Brotli

apt-get install brotli

Setting up on Apache

Apache has supported brotli since version 2.4.26 by way of the mod_brotli module. However, I can’t find any information on this so we are installing this module by kjdev

Install the Module

git clone --depth=1 --recursive https://github.com/kjdev/apache-mod-brotli.git
cd apache-mod-brotli
./autogen.sh
./configure
make
install -D .libs/mod_brotli.so /usr/lib/apache2/modules/mod_brotli.so -m 644
cd /etc/apache2/mods-available
echo "LoadModule brotli_module /usr/lib/apache2/modules/mod_brotli.so" > brotli.load

This has added the .load file to the mods available. We need to create an accompanying config file called brotli.conf, adding:


  BrotliCompressionLevel 10
  BrotliWindowSize 22
  AddOutputFilterByType BROTLI text/html text/plain text/css application/x-javascript

Enable the module

a2enmod brotli
service apache2 restart

You should now see in the response header that the page is compressed with brotli (br): 

Setting up on Nginx

Google has kindly released an Nginx Brotli module

Download the module

cd /usr/local/src
git clone https://github.com/google/ngx_brotli.git
cd ngx_brotli
git submodule update --init --recursive

Rebuild Nginx with our new module

You should run nginx -V to get your config string and add:

cd /opt/nginx-1.13.1/  (or your own path)
./configure YOUR CONFIG STRING --add-module=/usr/local/src/ngx_brotli
make
make install

Finally, add to your nginx.conf file

http {
    brotli on;
    brotli_static on;
}

In conclusion, the setup for both Apache and Nginx is pretty painless. If the browser does not support brotli it can always fallback to the ever faithful gzip.

What the heck is Brotli?

Just like gzip, Brotli is also a compression algorithm. It is developed by Google and serves best for text compression. The reason being, it uses a dictionary of common keywords and phrases on both client and server side and thus gives a better compression ratio. It is supported by all major browsers :

Image for post
Image Credit: caniuse.com

Does your browser support Brotli?
Browsers which support Brotli send ‘br’ along with ‘gzip’ in accept-encoding request header. If Brotli is enabled on your web server, you will get response in Brotli compressed format.

Image for post
We can check the encoding in response header (Image Credit: Certsimple)

Gzip vs Brotli:

The advantage for Brotli over gzip is that it makes use of a dictionary and thus it only needs to send keys instead of full keywords. According to certsimple,

  • Javascript files compressed with Brotli are 14% smaller than gzip.
  • HTML files are 21% smaller than gzip.
  • CSS files are 17% smaller than gzip.

Note: Images should not be compressed either by gzip or Brotli as they are already compressed and compressing them again will make their sizes larger.

Fewer bytes transferred not only leads to faster page load but also helps in reducing costs of Content Delivery Network (CDN). Now that we know all these benefits, let’s see how to enable Brotli…

Embracing the Brotli side:

There were two ways by which we can deliver Brotli compressed assets:

  • Enabling Brotli on our web-server
  • Enabling Brotli on CDNs

We chose to go ahead with serving Brotli from our web servers and installed it on nginx. Google has provided a module for it which needs nginx to be installed from source. Once installed, the following settings need to be put in nginx conf file:

brotli on;brotli_static on;        # for static compression, explained laterbrotli_comp_level 11;    # this setting can vary from 1-11brotli_types text/plain text/css application/javascript application/json image/svg+xml application/xml+rss;

After this, all content types which are mentioned in brotli_types setting will be brotli compressed. Easy, wasn’t it!

Note: We have to keep gzip settings on nginx intact as clients who doesn’t support br should get gzip compressed files. Although nginx gives precedence to br if both are supported.

Here, our web server will send br compressed assets, then CDN will just cache it and pass on to the browser.

Another way to enable Brotli is via CDN. By this way you don’t have to write any code or install anything in your infra, but this will be a paid service. We went for the ‘Brotli from Origin’ approach (former), as it is more cost efficient and engineering is what we like to do.

Dynamic vs Static Compression:

Dynamic compression means compressing files on the fly whereas static means to compress files once and then serve every time from cache. We used static compression for our Javascript and CSS files, as those will not change (until a new build is deployed). All these files are then cached on CDN and get served from there itself.

We talked about a setting ‘brotli_comp_level’ above and promised to explain it later, so here it is. It indicates the compression ratio and ranges between 1 to 11. Higher the ratio, higher the time it will take to compress it. So we used the value 11 for our static assets. For Dynamic assets like API responses , we should use smaller values – a high compression time can backfire, and put all our efforts to improve latency to turmoil.

Results :

Image for post
26% Reduction in CSS file-sizes
Image for post
17% Reduction in Javascript file-sizes
Image for post
Overall Results

Brotli from Origin Now Available for Akamai

What is Brotli from Origin?
Brotli from Origin specifically configures Akamai delivery to work well for customers that are providing Brotli-compressed resources at origin. The Brotli from Origin behavior allows for Akamai to deliver the Brotli-compressed resource when supported by the requesting user agent; otherwise, when not supported, non-Brotli resources are automatically used.

What are the benefits of Brotli from Origin?
With Brotli from Origin, Akamai now has a comprehensive Brotli solution that can be applied for resources already compressed at the origin. By applying Brotli compression to resources, you can reduce bandwidth consumption and improve web performance above and beyond what gzip can do.

How do I enable Brotli from Origin?

  • To enable it, simply add the behavior through Property Manager (On/Off setting). It can also be enabled through the Property Manager API.

How is Brotli from Origin different from Akamai Resource Optimizer?
Resource Optimizer takes resources from origin and applies Brotli compression to them as they come into the Akamai platform. They are then cached and delivered with Brotli compression. Brotli from Origin basically extends this approach to support resources already compressed at origin and enables them to be served through the Akamai cloud delivery platform and back to the browser. In other words, Akamai now supports both origins that provide Brotli compression as well as origins that do not. For origins that do not, Akamai now automatically provides Brotli compression.

We think it’s a great solution.

Perform AWS EC2 Backup: Step-By-Step Guide.

Over the last decade, the sheer amount of data in the world has grown exponentially, thus making it hard for some organizations to manage and store critical pieces of information on a daily basis, let alone protect it from unexpected data loss as a result of hardware failure, software corruption, accidental deletion, malicious attack, or an unpredictable disaster. More issues may arise still when it comes to managing AWS EC2 environments and protecting data stored in the cloud.

In short, AWS EC2 backup instances, you should choose one of the following options:

  1. Take an EBS snapshot;
  2. Create a new AMI;
  3. Design an AWS EC2 Backup plan;
  4. Automate AWS EC2 backup with a third-party solution.

AWS Backup is a rather new addition to the rich set of AWS services and tools, and is definitely worth your attention. AWS Backup is a valuable tool which can help you automatically back up and protect your data and applications in the AWS cloud as well as on-premises IT environments.

How to Back Up AWS EC2 Instances

AWS is a high-performance, constantly evolving cloud computing platform that allows you to store data and applications in the cloud environment. AWS can provide you with the tools you need to create EC2 instances which act as virtual servers with varying CPU, memory, storage, and networking capacity.

Currently, there are three ways to back up AWS EC2 instances: taking EBS snapshots, creating AMIs, or designing an AWS Backup plan. Let’s take a closer look at each of these approaches and see how they differ.

Taking EBS Snapshots

If you want to back up an AWS EC2 instance, you should create snapshots of EBS volumes, which are stored with the help of Amazon Simple Storage Service (S3). Snapshots can capture all data within EBS volumes and create their exact copies. Moreover, these EBS snapshots can then be copied and transferred to another AWS region to ensure safe and reliable storage of critical data. Thus, in case of a disaster or accidental data loss, you can be sure that you have a backup copy securely stored in a remote location which you can use for restoring critical data.

Prior to running AWS EC2 backup, it is recommended that you stop the instance or at least detach an EBS volume which is about to be backed up. This way, you can prevent failure or errors from occurring and affecting the newly created snapshots.

Please note that, for security purposes, some sensitive information has been removed.

To back up AWS EC2 instance, you need to take the following steps:

1. Sign in to your AWS account to open the AWS console.

2. Select Services in the top bar and click EC2 to launch the EC2 Management Console.

EC2 Services in AWS EC2 Backup

3. Select Running Instances and choose the instance you would like to back up.

Running Instances in AWS EC2 Backup

4. In the bottom pane, you can view the central technical information about the instance. In the Description tab, find the Root device section and select the /dev/sda1 link.

Selecting Root Device in AWS EC2 Backup

5. In the pop-up window, find the volume’s EBS ID name and click it.

6. The Volumes section should open. Click Actions and select Create Snapshot.

Creating Snapshot in AWS EC2 Backup

7. The Create Snapshot box should open, where you can add a description for the snapshot to make it distinct from other snapshots, as well as assign tags to easily monitor this snapshot. Click Create Snapshot.

Configuring a New Snapshot in AWS EC2 Backup

8. The snapshot creation should start and be completed in a minimal amount of time. The main factor here is the size of data in your Amazon EBS volume.

After the snapshot creation is complete, you can find your new snapshot by selecting the Snapshots section in the left pane. As you can see, we have successfully created a point-in-time copy of the EBS volume, which can later be used to restore your EC2 instance.

Snapshot Storage (AWS EC2 Backup)

For this purpose, you need to select the snapshot of the backed up volume, press the Actions button above, and click Create Volume. Following the prompts, configure the volume details (volume type, size, IOPS, availability zone, tags). Then, click Create Volume for the new volume to be created, which can later be added to the AWS EC2 instance of your choice.

Restoring the snapshot in AWS EC2 Backup

Creating a new AMI

The next approach to performing AWS EC2 backups is creating an Amazon Machine Image (AMI) of your AWS EC2 instances. An AMI contains all the information required for creating an EC2 instance in the AWS environment, including configuration settings, the root volume template, launch permissions, and block device mapping. Basically, the AMI can act as a template for launching a new AWS EC2 instance and replacing the corrupted one. Note that, prior to creating the new AMI, it is recommended that you stop the AWS EC2 instance which you want to back up.

To create a new AMI and ensure AWS EC2 backup, you should do the following:

1. Sign in to your AWS account to open the AWS console.

2. Select Services in the top bar and click EC2 to launch the EC2 Management Console.

EC2 Services in AWS EC2 Backup 2

3. Select Running Instances and choose the instance you want to back up.

Select Running Instances in AWS EC2 Backup

4. Click Actions > Image > Create Image.

How to Create Image in AWS EC2 Backup

5. The Create Image menu should open. Here, you can specify the image name, add the image description, enable/disable reboot after the AMI creation, and configure instance volumes.

Do note that when you create an EBS image, an EBS snapshot should also be created for each of the above volumes. You can access these snapshots by going to the Snapshots section.

The Create Image menu in AWS EC2 Backup

6. Click Create Image.

7. The image creation process should now start. Click the link to view the pending AMI.

8. It should take some time for the new AMI to be created. You can starting using the AMI when its status switches from pending to available.

After the AMI has been successfully created, it can then be used to create a new AWS EC2 instance, which will be an exact copy of the original instance. For this purpose, simply go to the Instances section, click Launch Instance, select the AMI you have created in the My AMIs section, and follow the prompts to finish the instance creation.

Restoring EC2 Instance with the AMI (AWS EC2 Backup)

Creating AMIs is arguably a more effective backup strategy than taking EBS snapshots. This is due to the fact that AMIs often contain EBS snapshots as well as a software configuration which allows you to simply and easily launch the new AWS EC2 instance in just a few clicks, created free of charge (you only pay for snapshot storage).

However, both methods require significant manual input on your part and cannot be set to run automatically. AWS EC2 backup in large-scale environments using these two approaches has proven itself to be a complicated and error-prone process.

Automating AWS EC2 backup

Previously, the only way to automate AWS EC2 backup was by running scripts or using API calls, which was a very challenging and resource-intensive process. The person responsible for backup automation had to be highly proficient in scripting in order to avoid any issues and inconsistencies. However, there was still a high risk that you would waste your time, effort, and money on a backup job configuration and still be left with failed or corrupted AWS EC2 backups.

Due to this ongoing concern, AWS decided to introduce the AWS Lambda service which allowed you to run your codes for managing the AWS services you need and performing various tasks in AWS environments. However, the downside of this approach is that you had to create your own codes or look for those available in open-source platforms. Ultimately, it could end up taking an excessive amount of time and effort to set up a workable code to perform the AWS Lambda function the way you want.

To deal with the existing issues further, the new AWS EC2 backup service referred to as AWS Backup was designed, allowing you to rapidly create automated data backups across AWS services and easily manage them using the central console. With AWS Backup, you can finally create a policy-based backup plan which can automatically back up the AWS resources of your choosing. At the core of each plan lies a backup rule which defines the backup schedule, backup frequency, and backup window, thus allowing you to automate the AWS EC2 backup process and requiring minimum input on your part.

To create an AWS backup plan, take the following steps:

1. Sign in to your AWS account to open the AWS Management Console.

2. Select Services in the top bar and then type AWS Backup in the search bar. Click Backup plans in the left pane.

3. Press the Create Backup plan button.

Backup Plans in AWS EC2 Backup

4. Here, you have three start options: Start from an existing plan, Build a new plan, and Define a plan using JSON. Click Info if you want to learn more about available options to help you make the right decision.

As we don’t have any existing backup plans, let’s build a new plan from scratch. Enter the new backup plan name and proceed further.

Building a New Plan in AWS EC2 Backup

5. The next step is Backup rule configuration. Here, you should specify the backup rule name.

6. After that, you can set up a backup schedule. You should determine the backup frequency (Every 12 hours, Daily, Weekly, Monthly, Custom cron expression); backup window (Use backup window defaults or Customize backup window); backup lifecycle (Transition to cold storage and Expiration of the backup).

Backup Rule Configuration in AWS EC2 Backup

7. At this step, you should select the backup vault for storing your recovery points (the ones created by this Backup rule). You can click Create new Backup vault if you want to have a new customizable vault. You can also use the existing Backup vault if you have one. Alternatively, you can choose the default AWS Backup vault.

Choosing the Backup Vault in AWS EC2 Backup

8. Next, you must add tags to recovery points and your backup plan in order to organize them and easily monitor their current status.

Adding Tags in AWS EC2 Backup

After that, you can click Create plan to proceed to the next stage, the backup rule creation.

9. Your backup plan has been successfully created. However, before you can run this plan and deploy it in your environment, you should also assign resources which need to be backed up. Click the Assign resources button, which can be found in the top bar.

New Backup Plan in AWS EC2 Backup

10. In the next menu, you can specify the resource assignment name and define the IAM (Identity and Access Management) role.

By selecting the IAM role, you specify what a user can or cannot do in AWS and determine which users are granted permission to manage selected AWS resources and services.

Additionally, you can assign resources to this Backup plan using tags or resource IDs, meaning that any AWS resources matching these key-pair values should be automatically backed up by this Backup plan.

Assigning Resources in AWS EC2 Backup

11. Click Assign resources to complete the configuration process. After that, the backup job should run automatically. You can go to the AWS Backup dashboard to see the current status of your backup jobs and verify that they are working as planned.

Data Protection Options in AWS EC2 Backup

As you can see, our backup job is already in progress. In this menu, you can also Manage Backup plans, Create an on-demand backup, or Restore backup. Choose the required option and set up another data protection job in AWS environment following the prompts.

Mount S3 bucket on EC2 Linux Instance as a drive

A S3 bucket can be mounted in a AWS instance as a file system known as S3fs. S3fs is a FUSE file-system that allows you to mount an Amazon S3 bucket as a local file-system. It behaves like a network attached drive, as it does not store anything on the Amazon EC2, but user can access the data on S3 from EC2 instance.

Filesystem in Userspace (FUSE) is a simple interface for userspace programs to export a virtual file-system to the Linux kernel. It also aims to provide a secure method for non privileged users to create and mount their own file-system implementations.

S3fs-fuse project is written in python backed by Amazons Simple Storage Service. Amazon offers an open API to build applications on top of this service, which several companies have done, using a variety of interfaces (web, rsync, fuse, etc).

Follow the below steps to mount your S3 bucket to your Linux Instance.

This Tutorial assumes that you have a running Linux EC2 instance on AWS with root access and a bucket created in S3 which is to be mounted on your Linux Instance. You will also require Access and Secret key pair with sufficient permissions of S3 or else an IAM access to generate or Create it.

We will perform the steps as a root user. You can also use sudo command if you are a normal user with sudo access. So lets get started.

Step-1:- If you are using a new centos or ubuntu instance. Update the system.

For CentOS or Red Hat

yum update all

For Ubuntu

apt-get update

Step-2:- Install the dependencies.

-> In CentOS or Red Hat

sudo yum install automake fuse fuse-devel gcc-c++ git libcurl-devel libxml2-devel make openssl-devel

In Ubuntu or Debian

sudo apt-get install automake autotools-dev fuse g++ git libcurl4-gnutls-dev libfuse-dev libssl-dev libxml2-dev make pkg-config

Step-3:- Clone s3fs source code from git.

git clone https://github.com/s3fs-fuse/s3fs-fuse.git

Step-4:- Now change to source code  directory, and compile and install the code with the following commands:

cd s3fs-fuse

./autogen.sh

./configure –prefix=/usr –with-openssl

make

sudo make install

Step-5:- Use below command to check where s3fs command is placed in O.S. It will also tell you the installation is ok.

which s3fs

Step-6:- Getting the access key and secret key.

You will need AWS Access key and Secret key with appropriate permissions in order to access your s3 bucket from your EC2 instance. You can easily manage your user permissions from IAM (Identity and Access Management) Service provided by AWS. Create an IAM user with S3 full access(or with a role with sufficient permissions) or use root credentials of your Account. Here we will use the root credentials for simplicity.

Go to AWS Menu -> Your AWS Account Name -> My Security Credentials. Here your IAM console will appear. You have to go to Users > Your Account name and under permissions Tab, check whether you have sufficient access on S3 bucket. If not, you can manually assign an existing  “S3 Full-Access” policy or create a new policy with sufficient permissions.

Now go to Security Credentials Tab and Create Access Key. A new Access Key and Secret Key pair will be generated. Here you can see access key and secret key (secret key is visible when you click on show tab) which you can also download. Copy these both keys separately.

Note that you can always use an existing access and secret key pair. Alternatively, you can also create a new IAM user and assign it sufficient permissions to generate the access and secret key.

Step-7 :- Create a new file in /etc with the name passwd-s3fs and Paste the access key and secret key in the below format .

touch /etc/passwd-s3fs

vim /etc/passwd-s3fs

Your_accesskey:Your_secretkey

Step-8:- change the permission of file

sudo chmod 640 /etc/passwd-s3fs

Step-9:- Now create a directory or provide the path of an existing directory and mount S3bucket in it.

If you have a simple bucket without dot(.) in the bucket name, use the commands used in point “a” or else for bucket with dot(.) in bucket name, follow step “b”:

a) Bucket name without dot(.):

mkdir /mys3bucket

s3fs <your_bucketname> -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 /mys3bucket

where, “your_bucketname” = the name of your S3 bucket that you have created on AWS S3, use_cache = to use a directory for its cache purpose, allow_other= to allow other users to write to the mount-point, uid= uid of the user/owner of the mountpoint (can also add “-o gid=1001” for group), mp_umask= to remove other users permission. multireq_max= parameter to send request to s3 bucket, /mys3bucket= mountpoint where the bucket will be mounted.

You can make an entry in /etc/rc.local to automatically remount after reboot.  Find the s3fs binary file by “which” command and make the entry before the “exit 0” line as below.

which s3fs

/usr/local/bin/s3fs

vim /etc/rc.local

/usr/local/bin/s3fs your_bucketname -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 /mys3bucket

b) Bucket name with dot(.):

s3fs your_bucketname /mys3bucket -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 -o use_path_request_style -o url=https://s3-{{aws_region}}.amazonaws.com

where, “your_bucketname” = the name of your S3 bucket that you have created on AWS S3, use_cache = to use a directory for its cache purpose, allow_other= to allow other users to write to the mount-point, uid= uid of the user/owner of the mountpoint (can also add “-o gid=1001” for group), mp_umask= to remove other users permission. multireq_max= parameter to send request to s3 bucket, /mys3bucket= mountpoint where the bucket will be mounted .

Remember to replace “{{aws_region}}” with your bucket region (example: eu-west-1).

You can make an entry in /etc/rc.local to automatically remount after reboot.  Find the s3fs binary file by “which” command and make the entry before the “exit 0” line as below.

which s3fs /usr/local/bin/s3fs

vim /etc/rc.local

s3fs your_bucketname /mys3bucket -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 -o use_path_request_style -o url=https://s3-{{aws_region}}.amazonaws.com

To debug at any point, add  “-o dbglevel=info -f -o curldbg” in the s3fs mount command.

Step-10:- Check mounted s3 bucket. Output will be similar as shown below but Used size may differ.

df -Th

“or”

df -Th /mys3bucket

Filesystem Type Size Used Avail Use% Mounted on

s3fs  fuse.s3fs 256T  0   256T   0%  /mys3bucket

If it shows the mounted file system, you have successfully mounted the S3 bucket on your EC2 Instance. You can also test it further by creating a test file.

cd /mys3bucket

echo “this is a test file to check s3fs” >> test.txt

ls

This change should also reflect on S3 bucket. So Login to your S3 bucket to verify if the test file is present or not.

Note : If you already had some data in s3bucket and it is not visible, then you have to set permission in ACL at the S3 AWS management console for that s3 bucket.

Also, If you get any s3fs error such as “transport end point is not connected”, you have to unmount and remount the file-system. You can also do so through a custom script to detect and perform remount automatically.

Congrats!! You have successfully mounted your S3 bucket to your EC2 instance. Any files written to /mys3bucket will be replicated to your Amazon S3 bucket.

Handle unwanted bounces and complaints on AWS SES

Reputation in the email world is critical to achieve reasonable deliver ability rates (the percentage of emails that arrive to inboxes); if you fall under certain levels, your emails end up in the spam folder or rejected by the email servers. To keep these numbers high, you have to constantly improve your email quality, but most importantly, you have to take action when a delivery fails or a recipient doesn’t want to receive your email.

To set a little bit of context about bounces and complaints processing, I’m reusing some of the previous post:

Amazon SES assigns a unique message ID to each email that you successfully submit to send. When Amazon SES receives a bounce or complaint message from an ISP, we forward the feedback message to you. The format of bounce and complaint messages varies between ISPs, but Amazon SES interprets these messages and, if you choose to set up Amazon SNS topics for them, categorizes them into JSON objects.

Amazon SES will categorize your hard bounces into two types: permanent and transient. A permanent bounce indicates that you should never send to that recipient again. A transient bounce indicates that the recipient’s ISP is not accepting messages for that particular recipient at that time and you can retry delivery in the future. The amount of time you should wait before re-sending to the address that generated the transient bounce depends on the transient bounce type. Certain transient bounces require manual intervention before the message can be delivered (e.g., message too large or content error). If the bounce type is undetermined, you should manually review the bounce and act accordingly.

A complaint indicates the recipient does not want the email that you sent them. When we receive a complaint, we want to remove the recipient addresses from our list.

In this post, we show you how to use AWS Lambda functions to receive SES notifications from the feedback loop from ISPs email servers via Amazon SNS and update an Amazon DynamoDB table with your email database.

Here is a high-level overview of the architecture:

Using the combination of Lambda, SNS and DynamoDB frees you from the operational overhead of having to run servers and maintain them. You focus on your application logic and AWS handles the non differentiating heavy lifting behind the operations, scalability, and high availability.

Workflow

  1. Create the SNS topic to receive the SES bounces, deliveries and complaints.
  2. Create the DynamoDB table to use for our email database.
  3. Create the Lambda function to process the bounces, deliveries and complaints and subscribe it to the SNS topic
  4. Test & start emailing!

SNS topic and subscription

We’ll use SNS topic and SNS subscription to trigger the SES notifications. Go to SNS and create a topic. I named mine snsSESLogs.

Once completed, create a subscription.

We’ll use a temporarily e-mail subscription that notifies us for any SES activity. You can leave this notification as permanent as you want, but mind that you’ll receive an email for EVERY SES event (delivery, bounce or complaint). I don’t want to receive thousands of e-mails a day, so I’ll use this only for the purpose of troubleshooting. Under Topic ARN, select the topic that you just created, choose Email for the Protocol and type your e-mail address under Endpoint.

Make sure you confirm your subscription by checking the e-mail address that you used for this subscription. You’ll receive an e-mail from AWS telling you to click the link in the e-mail and confirm.

SES

Now, go to SES and click on the domain that you have there. Find the Notifications section and click on Edit Configuration button at the bottom. Depending on what do you want logged, select the SNS topic that you just created. Most likely, you won’t want to see the delivery confirmations. You can still enable it here and then make a change in the script. The idea is to configure the SES domain, so anytime there is an e-mail going thru, a SNS notification is generated.

DynamoDB

Go to DynamoDB and click on Create table.

Name the table however you want, but keep in mind that you have to reference the name later. Add a primary key and a sort key. Also, name them whatever you want, but you have to reference them later. All these are case sensitive.

Use the defaults for the rest and click Create.

TTL (time-to-live) for items

Here is the first change that differs from the AWS tutorial above. In their tutorial, the records are there indefinitely. We don’t want that. All we care is to get the logs, parse them, deliver them and delete them. If you DO care about the logs for some compliance reason, skip this step. You’ll probably have to modify the parsing script later, but we’ll worry about that later. For now, let’s configure the TTL. In essence, this option tells AWS backend to delete certain records after whatever_you_choose hours or so. Read this, so you have an understanding of what’s going on. AWS doesn’t guarantee that it will delete those records/items exactly after whatever_you_choose hours, so you still have to do some filtering. I’ll talk about that later.
Go to the Overview tab of your table and click on Manage TTL.

Type TTL , leave the rest as is and click Continue. NOTE: Everything is case sensitive.

IAM

We’ll need a role and a policy to restrict access to the solution that we are implementing. Go to IAM console and click on Policies. Click on Create policy and click on the JSON tab. Delete the placeholder and paste this text. Make sure you replace the account number with yours and the name of the DynamoDB table that you just created. Don’t add the <> pair.

123456789101112131415{"Version": "2012-10-17","Statement": [{"Sid": "Stmt1428510662000","Effect": "Allow","Action": ["DynamoDB:PutItem"],"Resource": ["arn:aws:dynamodb:us-east-1:<account_number>:table/<DynamoDB_table_name>"]}]}

Click on Review policy and enter a Name and a Description.

Click on Create policy after.
The next step is to create an AWS role. Click on the Roles on the left and then click on Create role. Choose AWS service and then Lambda, then click Next: Permissions at the bottom.

Under Filter policies, start typing AWSLambda and choose the AWSLambdaBasicExecutionRole. Put a check mark next to it.

Before you click Next: Tags, type polSESL (or whatever you named your policy) under Filter policies again. Put a check mark to this policy and click on Next: Tags. The idea is to attach our policy for DynamoDB but also the policy for the execution of Lambda.

Tag it if you want and proceed with Next: Review.
Enter the Role name, Role description (if you want) and click on Create role.

Lambda

It’s time for our Lambda function. I made some changes to the Node.js script that was provided by AWS. The original script doesn’t log the sender’s domain/e-mail. Also, the original script doesn’t deal with the TTL attribute that we added. So, go to Lambda and click on Create function.
Use the following. Name your function, choose Node.js 8.10 for the Runtime, choose Use an existing role and select the role that we just created.

Click on Create function after.

Switch to Node.js 4.3 under Runtime and delete the placeholder code under index.js. NOTE: I am not Node.js expert and I am not sure if the code will work under Node.js 8+ and above. I didn’t have time to test it.

Paste this Node.js code. If you change the names of the DynamoDB table and the keys, make sure you change the names in the highlighted lines. The number “24” on line “Math.round(Date.now() / 1000) + 24 * 3600” means 24 hours. That’s my time-to-live. If you need to keep your records more or less, change this accordingly. If you don’t want to register the successful deliveries in DynamoDB comment line “ddb.putItem(itemParamsdel,”. Add // as prefix.

console.log('Loading event');
var aws = require('aws-sdk');
var ddb = new aws.DynamoDB({params: {TableName: 'ddbSESLogs'}});
exports.handler = function(event, context)
{
  console.log('Received event:', JSON.stringify(event, null, 2));
  const TTL = Math.round(Date.now() / 1000) + 24 * 3600;
  var SnsPublishTime = event.Records[0].Sns.Timestamp
  var SnsTopicArn = event.Records[0].Sns.TopicArn;
  var SESMessage = event.Records[0].Sns.Message
  SESMessage = JSON.parse(SESMessage);
  var SESMessageType = SESMessage.notificationType;
  var SESMessageId = SESMessage.mail.messageId;
  var SESDestinationAddress = SESMessage.mail.destination.toString();
  var SESSourceAddress = SESMessage.mail.source.toString();
  var LambdaReceiveTime = new Date().toString();
  if (SESMessageType == 'Bounce')
  {
    var SESreportingMTA = SESMessage.bounce.reportingMTA;
    var SESbounceSummary = JSON.stringify(SESMessage.bounce.bouncedRecipients);
    var itemParams = {Item: {SESMessageId: {S: SESMessageId}, SnsPublishTime: {S: SnsPublishTime}, TTL: {N: TTL.toString()},
      SESreportingMTA: {S: SESreportingMTA}, SESSourceAddress: {S: SESSourceAddress}, 
      SESDestinationAddress: {S: SESDestinationAddress}, SESbounceSummary: {S: SESbounceSummary}, SESMessageType: {S: SESMessageType}}};
    ddb.putItem(itemParams, function(err, data)
    {
      if(err) { context.fail(err)}
      else {
         console.log(data);
         context.succeed();
      }
    });
  }
  else if (SESMessageType == 'Delivery')
  {
    var SESsmtpResponse1 = SESMessage.delivery.smtpResponse;
    var SESreportingMTA1 = SESMessage.delivery.reportingMTA;
    var itemParamsdel = {Item: {SESMessageId: {S: SESMessageId}, SnsPublishTime: {S: SnsPublishTime}, TTL: {N: TTL.toString()}, SESsmtpResponse: {S: SESsmtpResponse1},
      SESreportingMTA: {S: SESreportingMTA1}, SESSourceAddress: {S: SESSourceAddress },
      SESDestinationAddress: {S: SESDestinationAddress }, SESMessageType: {S: SESMessageType}}};
    ddb.putItem(itemParamsdel, function(err, data)
    {
      if(err) { context.fail(err)}
      else {
        console.log(data);
        context.succeed();
      }
    });
  } 
  else if (SESMessageType == 'Complaint')
  {
    var SESComplaintFeedbackType = SESMessage.complaint.complaintFeedbackType;
    var SESFeedbackId = SESMessage.complaint.feedbackId;
    var itemParamscomp = {Item: {SESMessageId: {S: SESMessageId}, SnsPublishTime: {S: SnsPublishTime}, TTL: {N: TTL.toString()}, SESComplaintFeedbackType: {S: SESComplaintFeedbackType},
      SESFeedbackId: {S: SESFeedbackId}, SESSourceAddress: {S: SESSourceAddress },
      SESDestinationAddress: {S: SESDestinationAddress }, SESMessageType: {S: SESMessageType}}};
    ddb.putItem(itemParamscomp, function(err, data)
    {
      if(err) { context.fail(err)}
      else {
        console.log(data);
        context.succeed();
      }
    });
  }
};

Click Save in the upper-right corner once you are done and then click on Add trigger.

Click Save in the upper-right corner once you are done and then click on Add trigger.

Configure the trigger so it looks like this. The SNS topic that we created should be the trigger.

Click Add. At this point we are pretty much done with the AWS part.

Test

To check if everything works fine, send an e-mail from one of your domains that you configured to a non-existing e-mail address and existing domain (e.g. test@hariiyer.com). If everything is right, you’ll see a record in DynamoDB.

Scroll to the right and you’ll see our TTL attribute. Hover your mouse over it and you’ll see when is the record set to expire. I did my test on August 3rd, 6:30ish AM and it tells me that the record will expire in 24 hrs. Remember line 7 in the Node.js Lambda script? We put 24 there.

If something is wrong and you are not seeing the expected result, go to Cloudwatch and under Logs, choose the logs for your Lambda function.

Parsing

You can export the records from DynamoDB from the console as CSV file. Unfortunately, if you want to export the table from command line using AWS CLI, you can’t. You can only get the output as text or JSON.

1aws dynamodb scan --table-name ddbSESLogs --query "Items[*]" --output json

In order to convert it to CSV, we’ll use a tool called jq. You can get the tool from here. Create this file first and save it as json2csv.jq.

123456789101112131415161718def json2header:[paths(scalars) | join(".")]; def json2array($header):json2header as $h| if $h == $headerthen [paths(scalars) as $p | getpath($p)]else "headers do not match: \($header) vs \($h)" | errorend ; # given an array of conformal objects, produce "CSV" rows, with a header row:def json2csv:(.[0] | json2header) as $h| ($h, (.[] | json2array($h)))| @csv ; # `main`json2csv

Now, you can pipe the output of the JSON file to jq.

1aws dynamodb scan --table-name ddbSESLogs --query "Items[*]" --output json | jq -rf json2csv.jq > output.csv

If you open output.csv, you’ll see that you have all your records there. But, here is the problem. Let’s say that you schedule the command above to run every 24 hours. You would think that AWS will delete the expired records that are older than 24 hours. But, they don’t guarantee that. So, we have to do the filtering. We’ll download whatever is in the table and then parse the input to filter records that are within 24 hour range.
Let’s create these two files. Name them names.json and values.json. If you named your TTL attribute differently or in lower case, you have to change the values below.
names.json

123{"#t": "TTL"}

values.json

123456{":TTL":{"N": "1564502762"}}

The value 1564502762 is an Epoch date. The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (in ISO 8601: 1970-01-01T00:00:00Z).
The idea is to inject an Epoch value in values.json that is 24 hours in the past, so when we run the command to filter, we’ll get the records that are 24 hours or less old.
This is the command that will filter the DynamoDB table.

12aws dynamodb scan --table-name ddbSESLogs --query "Items[*]" --output json --filter-expression "#t > :TTL" \--expression-attribute-names file://names.json --expression-attribute-values file://values.json

If you run the command as-is, you’ll get all the records. Why? Because 1564502762 is in the past. It’s July 30th, 4PM. Use this site to convert between human readable output and Epoch Unix time. To get an idea of what’s going on, here are my TTL values for the latest records.
.
If I change the value in values.json to be 1564831295, I’ll get the last record from the set and nothing more. The before-last record won’t be included because I am using greater than, not greater or equal than comparison in the line below.

12aws dynamodb scan --table-name ddbSESLogs --query "Items[*]" --output json --filter-expression "#t > :TTL" \--expression-attribute-names file://names.json --expression-attribute-values file://values.json

So, before I do a comparison, I have to change the value in values.json and make it 24 hours ago. Make sure the JSON format of values.json is exactly the same as my example.

1sed -i "s/.*[\"]N[\"]:.*/$(echo -e "\t"'"N": "'$(echo "`date +%s` -3600*24"|bc)'"')/" values.json

The command above when executed will change the value in values.json for 24 hours in the past when executed. Now, you can combine these two in a small script and schedule it in a cron. You can expand the script so it sends the result to S3 or in an e-mail to whoever is responsible. Make sure the paths are correct for sed, aws and json files.

12345#!/bin/sh/usr/bin/sed -i "s/.*[\"]N[\"]:.*/$(echo -e "\t"'"N": "'$(echo "`date +%s` -3600*24"|bc)'"')/" /somewhere/values.json/usr/bin/aws dynamodb scan --table-name ddbSESLogs --query "Items[*]" --output json --filter-expression "#t > :TTL" \--expression-attribute-names file://somewhere/names.json \--expression-attribute-values file://somewhere/values.json > /somewhere/output.csv

AWS Glue (optional)

If you don’t want to deal with a Linux server, AWS CLI and jq, then you can use AWS Glue. This AWS ETL service will allow you to run a job (scheduled or on-demand) and send your DynamoDB table to an S3 bucket. It’s up to you what you want to do with the files in the bucket. You might want to keep them indefinitely, move them to Glacier or just expire them after some time. First, create an S3 bucket and name it aws-glue-something. The reason I’ll name the bucket like this is because AWS Glue will create its own policy and this policy have write access to all aws-glue-* buckets. So, instead of naming my bucket whatever I want and then attach extra policy, I’ll use only a single policy.
Then, go to AWS Glue and click on Databases from top left. Then click Add database.

Name the database (in my case gdbSESLogs) and click on Create.

Click on Tables below Databases and click Add tables, then Add tables using a crawler.

I’ll name my crawler craSESLogs.

Choose Data stores.

Choose a DynamoDB data store and select your DynamoDB table.

Don’t add any additional data stores and click Next.
Allow AWS to create an IAM role for you. Just name it.

You can choose how do you want to schedule the crawler. Pick the Run on demand option so you can test. You can always modify the crawler later and schedule it.

Select the Glue database that you’ve created initially.

Finally, click Finish. Now that you have Glue database, table and crawler ready, run the crawler so it takes the data from DynamoDB and populates the Glue table.

Once the crawler completes, from the left-side menu under ETL sub-menu choose Jobs and click Add job.

Name the job, select the role you’ve created earlier and leave the rest as-is on my screenshot.

For the data source, choose the DynamoDB table.

For the data target, choose to Create tables in your data target. Choose S3 for the Data store, CSV as Format and choose the bucket where the exported file will end as below. NOTE: Make sure your IAM role allows write access to the S3 bucket.

You can change the mappings or accept the defaults as I did. For example, if you don’t like some columns to end up in the CSV file, you can delete them here. Finally, click Save job and edit script.

You’ll be presented with the Python script. The problem with this script is that for each record in DynamoDB, it will generate a separate file. If you are fine with this approach, just click Run job and see the logs. This isn’t a behavior that I expected, so I found a way to merge all these single files into a single file. If you have millions of records, this approach is not recommended. Anyway, these are the modifications that I made so I get a single file.
Original script.

12345678...applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("sesreportingmta", "string", "sesreportingmta", "string"), ("sesmessagetype", "string", "sesmessagetype", "string"), ("sessourceaddress", "string", "sessourceaddress", "string"), ("sesmessageid", "string", "sesmessageid", "string"), ("sesdestinationaddress", "string", "sesdestinationaddress", "string"), ("sesbouncesummary", "string", "sesbouncesummary", "string"), ("snspublishtime", "string", "snspublishtime", "string"), ("ttl", "long", "ttl", "long")], transformation_ctx = "applymapping1")## @type: DataSink## @args: [connection_type = "s3", connection_options = {"path": "s3://aws-glue-seslogs"}, format = "csv", transformation_ctx = "datasink2"]## @return: datasink2## @inputs: [frame = applymapping1]datasink2 = glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": "s3://aws-glue-seslogs"}, format = "csv", transformation_ctx = "datasink2")job.commit()

Modified script.

12345678applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("sesreportingmta", "string", "sesreportingmta", "string"), ("sesmessagetype", "string", "sesmessagetype", "string"), ("sessourceaddress", "string", "sessourceaddress", "string"), ("sesmessageid", "string", "sesmessageid", "string"), ("sesdestinationaddress", "string", "sesdestinationaddress", "string"), ("sesbouncesummary", "string", "sesbouncesummary", "string"), ("snspublishtime", "string", "snspublishtime", "string"), ("ttl", "long", "ttl", "long")], transformation_ctx = "applymapping1")repartition = applymapping1.repartition(1)## @type: DataSink## @args: [connection_type = "s3", connection_options = {"path": "s3://aws-glue-seslogs"}, format = "csv", transformation_ctx = "datasink2"]## @return: datasink2## @inputs: [frame = applymapping1]datasink2 = glueContext.write_dynamic_frame.from_options(frame = repartition, connection_type = "s3", connection_options = {"path": "s3://aws-glue-seslogs"}, format = "csv", transformation_ctx = "datasink2")job.commit()

Pretty much, I’ve added one line #2 repartition = … after applymapping1 = … and then replaced applymapping1 in line #7 with repartition in the datasink2 = … line. Now, if you run the job and wait to finish, you’ll see a single file under your S3 bucket.

%d bloggers like this: