Handle Up to 3.05 Times the NGINX Connections with AWS C6i or R6i Instances Featuring 3rd Gen Intel® Xeon® Scalable Processors with Crypto Acceleration

NGINX

  • Compute-Optimized Instances: Up to 3.05x the NGINX Connections per Second Using Intel Crypto Acceleration on C6i Instances vs. C6i Instances without Crypto Acceleration.

  • Memory-Optimized Instances: Up to 2.93x the NGINX Connections per Second Using Intel Crypto Acceleration on R6i Instances vs. R6i Instances without Crypto Acceleration.

author-image

Oleh

For Both Compute-Optimized and Memory-Optimized Instances, Using Crypto Acceleration Improved Performance

Encrypting network connections using SSL/TLS helps keep consumer data safe as they send personal data over the internet. Organizations use NGINX—an open-source web server application—as a reverse proxy, load balancer, or mail proxy. Whether your organization runs NGINX on AWS compute-optimized or memory-optimized instances, choosing 3rd Gen Intel Xeon Scalable processors with Crypto Acceleration can improve performance.

3rd Gen Intel Xeon Scalable processors offer Intel QuickAssist Technology (Intel QAT) with Crypto Acceleration. Intel QAT and Intel QAT Engine (OpenSSL Engine) accelerate hardware and software based on vectorized instructions to speed cryptographic operations and allow more users to connect at a time. We tested NGINX performance with and without Crypto Acceleration on two instance types: compute-optimized (C6i) and memory-optimized (R6i). We found that for both instance types, adding Crypto Acceleration with Intel QAT resulted in more connections per second for NGINX workloads.

Improving NGINX Performance on Compute-Optimized Instances

On compute-optimized C6i instances, using Intel QAT Crypto Acceleration in conjunction with 3rd Gen Intel Xeon Scalable processors boosted NGINX performance significantly over instances without Crypto Acceleration (see Figure 1). At 64 vCPUs, enabling Crypto Acceleration increased connections per second by up to 3.05 times compared to the same instance without Crypto Acceleration.

Figure 1. Relative NGINX performance, in connections per second, that C6i instances handled with and without Intel Crypto Acceleration. Higher is better.

Improving NGINX Performance on Memory-Optimized Instances

Results were similar for memory-optimized R6i instances. As Figure 2 shows, using Crypto Acceleration on R6i instances featuring 3rd Gen Intel® Xeon® Scalable processors improved NGINX connections per second by up to 2.93 times.

Figure 2. Relative NGINX performance, in connections per second, that R6i instances handled with and without Intel Crypto Acceleration. Higher is better.

Conclusion

On both compute-optimized and memory-optimized AWS instances we tested, using Intel QAT Crypto Acceleration improved NGINX performance—increasing the number of connections per second the web server could handle by up to 3.05 times compared to the same instances without Crypto Acceleration. Organizations seeking to boost NGINX connection rates can do so by selecting instances with 3rd Gen Intel Xeon Scalable processors with Crypto Acceleration.

Learn More

To begin running your NGINX workloads on compute-optimized AWS C6i instances with 3rd Gen Intel Xeon Scalable processors, visit https://aws.amazon.com/ec2/instance-types/c6i/. To select memory-optimized AWS R6i instances with 3rd Gen Intel Xeon Scalable processors, visit https://aws.amazon.com/ec2/instance-types/r6i/.

All tests by Intel on AWS/us-west-2b from 03/2022-04/2022. All tests: Ubuntu 20.04.4 LTS 5.13.0-1019-aws, v1.24.2.intel-13-g5ae1948f, gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31, Client Server: c6i.32xlarge, Number of Clients:2, Run Iterations:3, Cipher: AES128-GCM-SHA256. All QAT configs: async mode Nginx w/ QATEngine. VM Instance details: c6i.xlarge: ICX x86_64 CPUs, 4 vCPUs, 8GB RAM, 4 worker processes; r6i.xlarge: ICX x86_64 CPUs, 4 vCPUs, 32GB RAM, 4 worker processes; c6i.2xlarge: ICX x86_64 CPUs, 8 vCPUs, 16GB RAM, 8 worker processes; r6i.2xlarge: ICX x86_64 CPUs, 8 vCPUs, 64GB RAM, 8 worker processes; c6i.4xlarge: ICX x86_64 CPUs, 16 vCPUs, 32GB RAM, 16 worker processes; r6i.4xlarge: ICX x86_64 CPUs, 16 vCPUs, 128GB RAM, 16 worker processes; c6i.8xlarge: ICX x86_64 CPUs, 32 vCPUs, 64GB RAM, 32 worker processes; r6i.8xlarge: ICX x86_64 CPUs, 32 vCPUs, 256GB RAM, 32 worker processes; c6i.16xlarge: ICX x86_64 CPUs, 64 vCPUs, 128GB RAM, 64 worker processes; r6i.16xlarge: ICX x86_64 CPUs, 64 vCPUs, 512GB RAM, 64 worker processes.