Gotcha’s with Cloudfront + S3 Architecture

Right now, we have a website hosted in an S3 bucket using CloudFront defined in terraform running with automated tests. There's a few catches with using this setup so we'll go into what these issues are and how to work around them.

the lifecycle of the cache

There's a few different ways to configure TTLs on CloudFront

  • Use a TTL of 0

Control the cache at your Origin. See this Stack Overflow Question - What is a TTL 0 in Cloudfront Useful for. S3 has that behavior where if an object has not been modified and it has been requested recently, S3 will return a 304 Not Modified response and Cloudfront will continue to use the cached version until S3 says it should update.

  • Use a non-zero TTL

Let Cloudfront be in charge of the cache. Things will stay cached in Cloudfront for the duration of your TTL, but see here for what the different TTLs you have available are.

Use Cache Invalidation to bust the cache when you need it. You can invalidate the whole distribution or just a subpath, e.g. /blog. This takes time to propagate as the cache needs to be invalidated at multiple regional locations.

S3 has outages too

S3 has had at least one outage a year, and while they are actively monitoring their services and work hard to bring it back as soon as possible, there's always the chance that your website becomes inaccessible due to issues with S3. If your website brings in sales, then any outage on S3 costs you money. So how do you handle S3 outages?

  • Figure out what type of SLA you need to have for your website. You can rely on Cloudfront's cache to give you a window of time before you need to act. For example, if S3 goes down, but your content is cached on Cloudfront, then you can configure it to continue serving content if the origin dies. See this AWS doc for a more in-depth description to make sure you're still up when S3 is down.
  • Leverage S3 Bucket Replication. This is a feature of S3 that will propagate changes to a bucket in another region. This alone is not enough to be resilient as there's a whole slew of limitations of S3 & Cloudfront that get in the way
    • Buckets must have the same name as the domain you want to use to access it. If your website is imcool.com, your bucket's name must be imcool.com. This means that you can't have imcool.com in us-east-1 AND us-east-2.
    • If you name your buckets imcool-us-east-1.com and imcool-us-east-2.com, then you won't be able to access them with your 1 domain name imcool.com.
    • A custom domain can only be associated with one Cloudfront distribution at a time. You cannot have Cloudfront-us-east-1 & Cloudfront-us-east-2 both with imcool.com as an alternate CNAME.
    • You can define multiple origins for your distribution but you can only control where certain paths like /assets or /blog get directed to which origins. So /assets should request from my assets origin, while /blog should request from the site origin. But you cannot define something like /* should request from s3-origin-us-east-1 AND (if that is down) /* should request from s3-origin-us-east-2.
    • Other solutions around the internet
    • There is this solution posted by IOpipe involving Cloudfront + R53 Health Checks + S3. The downside here is that if you rely on S3 as a website, it can't reply to HTTPS requests so failover would downgrade you from HTTPS to HTTP.
    • Multi-region load balancers in front of multi-region s3 buckets with cloudfront suggest in a stack overflow answer
    • A pretty good discussion on /r/aws on possibilities and pros/cons of each.

Setting custom headers

S3 lets you configure various headers like cache-control, content-md5 depending on what you want to do with your content. However you are limited to what headers you can set so it doesn't support adding things like X-Frame-Options, or Content-Security-Policy with just S3.

Configure Lambda@Edge to add security headers

Use Lambda@Edge to manipulate the response coming out of Cloudfront. Cloudfront is now offering different hooks where you can attach a Lambda like when receiving a request, or sending a response - see here for their walkthrough for setting this up. You can create a simple Lambda that looks at the response payload sent from the origin back to cloudfront and add whatever headers you'd like to include in the response. You can either set them globally, or apply them conditionally based on context using something like this

'use strict';
exports.handler = (event, context, callback) => {

    //Get contents of response
    const response = event.Records[0].cf.response;
    response.headers['strict-transport-security'] = [{key: 'Strict-Transport-Security', value: 'max-age= 63072000; includeSubdomains; preload'}]; 
    callback(null, response);
}
Tags: ,

You might be interested in…

Menu