Exploring Amazon S3: Key Features, Storage Options, and Best Practices
Table of contents
- S3 - Simple Storage Service
- Bucket
- Naming rules for buckets and objects in S3
- Object
- S3 storage classes
- S3 Standard: For frequently accessed data. It offers high durability, availability, and performance1.
- S3 Standard-IA (Infrequent Access): For data that is less frequently accessed but requires rapid access when needed.
- S3 One Zone-IA: Similar to Standard-IA but stored in a single availability zone, making it cheaper.
- S3 Glacier: For data that is rarely accessed and requires long-term storage. Retrieval times can vary from minutes to hours1.
- S3 Glacier Deep Archive: For data that is rarely accessed and can tolerate retrieval times of 12 hours.
- S3 Intelligent-Tiering: Automatically moves data between access tiers based on changing access patterns.
- S3 Life Cycle Management
- Storage replication
Amazon provides 3 different types of storage services
Object Storage ==> Amazon provides object storage in form of S3
File Storage ==> Amazon provides file storage in form of EFS (Linux), FSx (Windows)
Block Storage ==> Amazon provides block storage in form of EBS
S3 - Simple Storage Service
Amazon S3 is a cloud storage service provided by AWS. It allows to store and retrieve any amount of data from anywhere in the world. S3 service allows you to create buckets in which you can store anything (images, videos, files, folders, CSV files).By default, the maximum number of buckets that can be created per account is 100. For additional buckets, one can submit a request for a service limit increase.
Characteristics of S3
Highly scalable —> S3 stores unlimited amount of data in single bucket, one object can store upto max of 5TB
Highly availability—>S3 ensures 99.119’s of reliability to provide data stored in its object.
Secure —> S3 provides bucket policies, access control, encryption settings to control data in objects from unauthorized access
Cost Effective —> We can store huge data in S3 with less cost depending on the storage class we select
Performance—>S3 provides multi-part-upload feature which uploads larger files in smaller chunks which improves performance.
Bucket
A bucket in S3 is like a folder where you store your files. Each bucket has a unique name, and this name must be globally unique because it forms part of the URL used to access the bucket.
Naming rules for buckets and objects in S3
must have a length of 3 to 63 characters.
Additionally, names can only contain lowercase letters, digits, dots (.), and hyphens (-).
S3 Bucket names must therefore start and conclude with a letter or number.
As a result, AWS S3 Bucket names cannot be represented as IP addresses. (for example, 192.168.5.4).
Moreover, Amazon S3 Bucket names cannot start with
xn--
(for buckets created after February 2020).Finally, AWS S3 Bucket names cannot contain dots (.) when used with Amazon S3 Transfer Acceleration
Note : The namespace for bucket names is global, not regional. “Like domain names, the bucket namespace spans the entire world. A bucket name that is already in use by another Amazon S3 user cannot be utilized by you.” A duplicate with the same name as the one you already have in another location cannot be created.
Object
An object in S3 is essentially a file. Each object consists of:
Key: The name of the object (e.g.,
photo.jpg
,document.pdf
).Value: The actual data, which is a sequence of bytes.
Version ID: A unique identifier for the object.
Metadata: Information about the data (e.g., content type, custom tags).
Access Control Information: Permissions to control who can access the object
S3 storage classes
Amazon S3 offers a wide range of storage classes for different use cases. These provide us the storage for data that is rarely used, and doesn’t require instant access, long-term archive, digital preservation, and many more. All Amazon S3 storage classes have a high level of reliability but differs by cost.
S3 Standard: For frequently accessed data. It offers high durability, availability, and performance1.
S3 Standard-IA (Infrequent Access): For data that is less frequently accessed but requires rapid access when needed.
S3 One Zone-IA: Similar to Standard-IA but stored in a single availability zone, making it cheaper.
S3 Glacier: For data that is rarely accessed and requires long-term storage. Retrieval times can vary from minutes to hours1.
S3 Glacier Deep Archive: For data that is rarely accessed and can tolerate retrieval times of 12 hours.
S3 Intelligent-Tiering: Automatically moves data between access tiers based on changing access patterns.
S3 Life Cycle Management
Storage replication
Amazon S3 offers several replication options to help you manage and protect your data. Replication involves copying data from one location to another, ensuring redundancy and availability. Here are the main types of replication in S3:
Unidirectional Replication
Definition: Unidirectional replication in S3 means that data is copied from one bucket (the source) to another bucket (the destination) in a single direction. Changes in the source bucket (such as new uploads or updates) are automatically replicated to the destination bucket, but not the other way around.
Example: You have a primary bucket in the US East (Ohio) region and a backup bucket in the EU (Ireland) region. Any new objects or changes made in the primary bucket are automatically copied to the backup bucket. However, if you upload something directly to the backup bucket, it will not be copied back to the primary bucket.
No Replication
Definition: No replication means that data is not automatically copied between buckets. Each bucket operates independently without any automatic data synchronization between them.
Example: You have two buckets, one in the US East (Ohio) region and another in the EU (Ireland) region. When you upload data to the US East bucket, it stays there and is not copied to the EU bucket. Similarly, any data in the EU bucket is not replicated to the US East bucket. If you need to have the same data in both buckets, you would have to manually copy it.
Bidirectional Replication
Definition: Bidirectional replication involves two buckets replicating data to each other. Changes made in either bucket are automatically replicated to the other. This ensures that both buckets always have the same data, providing high availability and redundancy.
Example: You have two buckets, one in the US East (Ohio) region and another in the EU (Ireland) region. If you upload a file to the US East bucket, it is automatically copied to the EU bucket. Similarly, if you upload a file to the EU bucket, it is automatically copied to the US East bucket. Both buckets always contain the same data.