docsrs only.Expand description
Auto region
- Proposal Name:
auto_region - Start Date: 2022-02-24
- RFC PR: apache/opendal#57
- Tracking Issue: apache/opendal#58
§Summary
Automatically detecting user’s s3 region.
§Motivation
Current behavior for region and endpoint is buggy. endpoint=https://s3.amazonaws.com and endpoint="" are expected to be the same, because endpoint="" means take the default value https://s3.amazonaws.com. However, they aren’t.
S3 SDK has a mechanism to construct the correct API endpoint. It works like format!("s3.{}.amazonaws.com", region) internally. But if we specify the endpoint to https://s3.amazonaws.com, SDK will take this endpoint static.
So users could meet errors like:
attempting to access must be addressed using the specified endpointAutomatically detecting the user’s s3 region will help resolve this problem. Users don’t need to care about the region anymore, OpenDAL will figure it out. Everything works regardless of whether the input is s3.amazonaws.com or s3.us-east-1.amazonaws.com.
§Guide-level explanation
OpenDAL will remove region option, and users only need to set the endpoint now.
Valid input including:
https://s3.amazonaws.comhttps://s3.us-east-1.amazonaws.comhttps://oss-ap-northeast-1.aliyuncs.comhttp://127.0.0.1:9000
OpenDAL will handle the region internally and automatically.
§Reference-level explanation
S3 services support mechanism to indicate the correct region on itself.
Sending a HEAD request to <endpoint>/<bucket> will get a response like:
:) curl -I https://s3.amazonaws.com/databend-shared
HTTP/1.1 301 Moved Permanently
x-amz-bucket-region: us-east-2
x-amz-request-id: NPYSWK7WXJD1KQG7
x-amz-id-2: 3FJSJ5HACKqLbeeXBUUE3GoPL1IGDjLl6SZx/fw2MS+k0GND0UwDib5YQXE6CThiQxpYBWZjgxs=
Content-Type: application/xml
Date: Thu, 24 Feb 2022 05:15:13 GMT
Server: AmazonS3x-amz-bucket-region: us-east-2 will be returned, and we can use this region to construct the correct endpoint for this bucket:
:) curl -I https://s3.us-east-2.amazonaws.com/databend-shared
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-2
x-amz-request-id: 98CN5MYV3GQ1XMPY
x-amz-id-2: Tdxy36bRRP21Oip18KMQ7FG63MTeXOpXdd5/N3izFH0oalPODVaRlpCkDU3oUN0HIE24/ezX5Dc=
Content-Type: application/xml
Date: Thu, 24 Feb 2022 05:16:57 GMT
Server: AmazonS3It also works for S3 compilable services like minio:
# Start minio with `MINIO_SITE_REGION` configured
:) MINIO_SITE_REGION=test minio server .
# Sending request to minio bucket
:) curl -I 127.0.0.1:9900/databend
HTTP/1.1 403 Forbidden
Accept-Ranges: bytes
Content-Length: 0
Content-Security-Policy: block-all-mixed-content
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Bucket-Region: test
X-Amz-Request-Id: 16D6A12DCA57E0FA
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Thu, 24 Feb 2022 05:18:51 GMTWe can use this mechanism to detect region automatically. The algorithm works as follows:
- If
endpointis empty, fill it willhttps://s3.amazonaws.comand the corresponding template:https://s3.{region}.amazonaws.com. - Sending a
HEADrequest to<endpoint>/<bucket>. - If got
200or403response, the endpoint works.- Use this endpoint directly without filling the template.
- Take the header
x-amz-bucket-regionas the region to fill the endpoint. - Use the fallback value
us-east-1to make SDK happy if the header not exists.
- If got a
301response, the endpoint needs construction.- Take the header
x-amz-bucket-regionas the region to fill the endpoint. - Return an error to the user if not exist.
- Take the header
- If got
404, the bucket could not exist, or the endpoint is incorrect.- Return an error to the user.
§Drawbacks
None.
§Rationale and alternatives
§Use virtual style <bucket>.<endpoint>?
The virtual style works too. But not all services support this kind of API endpoint. For example, using http://testbucket.127.0.0.1 is wrong, and we need to do extra checks.
Using <endpoint>/<bucket> makes everything easier.
§Use ListBuckets API?
ListBuckets requires higher permission than normal bucket read and write operations. It’s better to finish the job without requesting more permission.
§Misbehavior S3 Compilable Services
Many services didn’t implement S3 API correctly.
Aliyun OSS will return 404 for every bucket:
:) curl -I https://aliyuncs.com/<my-existing-bucket>
HTTP/2 404
date: Thu, 24 Feb 2022 05:32:57 GMT
content-type: text/html
content-length: 690
ufe-result: A6
set-cookie: thw=cn; Path=/; Domain=.taobao.com; Expires=Fri, 24-Feb-23 05:32:57 GMT;
server: Tengine/AserverQingStor Object Storage will return 307 with the Location header:
:) curl -I https://s3.qingstor.com/community
HTTP/1.1 301 Moved Permanently
Server: nginx/1.13.6
Date: Thu, 24 Feb 2022 05:33:55 GMT
Connection: keep-alive
Location: https://pek3a.s3.qingstor.com/community
X-Qs-Request-Id: 05b83b615c801a3dIn this proposal, we will not figure them out. It’s easier for the user to fill the correct endpoint instead of automatically detecting them.
§Prior art
None
§Unresolved questions
None
§Future possibilities
None