Skip to main content

WebHDFS

WebHDFS's REST API support.

There two implementations of WebHDFS REST API:

  • Native via HDFS Namenode and Datanode, data are transferred between nodes directly.
  • HttpFS is a gateway before hdfs nodes, data are proxied.

Capabilities

This service can be used to:

  • stat
  • read
  • write
  • create_dir
  • delete
  • copy
  • rename
  • list
  • scan
  • presign
  • blocking

Differences with HDFS

[Hdfs][crate::services::Hdfs] is powered by HDFS's native java client. Users need to set up the HDFS services correctly. But webhdfs can access from HTTP API and no extra setup needed.

WebHDFS Compatibility Guidelines

File Creation and Write

For File creation and write operations, OpenDAL WebHDFS is optimized for Hadoop Distributed File System (HDFS) versions 2.9 and later. This involves two API calls in webhdfs, where the initial put call to the namenode is redirected to the datanode handling the file data. The optional noredirect flag can be set to prevent redirection. If used, the API response body contains the datanode URL, which is then utilized for the subsequent put call with the actual file data. OpenDAL automatically sets the noredirect flag with the first put call. This feature is supported starting from HDFS version 2.9.

Multi-Write Support

OpenDAL WebHDFS supports multi-write operations by creating temporary files in the specified atomic_write_dir. The final concatenation of these temporary files occurs when the writer is closed. However, it's essential to be aware of HDFS concat restrictions for earlier versions, where the target file must not be empty, and its last block must be full. Due to these constraints, the concat operation might fail for HDFS 2.6. This issue, identified as HDFS-6641, has been addressed in later versions of HDFS.

In summary, OpenDAL WebHDFS is designed for optimal compatibility with HDFS, specifically versions 2.9 and later.

Configurations

  • root: The root path of the WebHDFS service.
  • endpoint: The endpoint of the WebHDFS service.
  • delegation: The delegation token for WebHDFS.
  • atomic_write_dir: The tmp write dir of multi write for WebHDFS.Needs to be configured for multi write support.

Refer to [Builder]'s public API docs for more information.

Examples

Via Builder

use std::sync::Arc;

use anyhow::Result;
use opendal::services::Webhdfs;
use opendal::Operator;

#[tokio::main]
async fn main() -> Result<()> {
let mut builder = Webhdfs::default();
// set the root for WebHDFS, all operations will happen under this root
//
// Note:
// if the root is not exists, the builder will automatically create the
// root directory for you
// if the root exists and is a directory, the builder will continue working
// if the root exists and is a folder, the builder will fail on building backend
builder.root("/path/to/dir");
// set the endpoint of webhdfs namenode, controlled by dfs.namenode.http-address
// default is http://127.0.0.1:9870
builder.endpoint("http://127.0.0.1:9870");
// set the delegation_token for builder
builder.delegation("delegation_token");
// set atomic_write_dir for builder
builder.atomic_write_dir(".opendal_tmp/");

let op: Operator = Operator::new(builder)?.finish();

Ok(())
}

Via Config

use anyhow::Result;
use opendal::Operator;
use opendal::Scheme;
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<()> {
let mut map = HashMap::new();
map.insert("endpoint".to_string(), "http://127.0.0.1:9870".to_string());
map.insert("root".to_string(), "/path/to/dir".to_string());
map.insert("delegation".to_string(), "delegation_token".to_string());

let op: Operator = Operator::via_map(Scheme::Webhdfs, map)?;
Ok(())
}