You are viewing docs on Elastic's new documentation system, currently in technical preview. For all other Elastic docs, visit elastic.co/guide.

Last updated: Apr 10th, 2023

Hadoop

Collect metrics from Apache Hadoop with Elastic Agent.

Beta feature

This functionality is in beta and is subject to change. The design and code is less mature than official generally available features and is being provided as-is with no warranties. Beta features are not subject to the support service level agreement of official generally available features.

What is an Elastic integration?

This integration is powered by Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to a host. It can also protect hosts from security threats, query data from operating systems, forward data from remote services or hardware, and more. Refer to our documentation for a detailed comparison between Beats and Elastic Agent.

Prefer to use Beats for this use case? See Filebeat modules for logs or Metricbeat modules for metrics.

Get started with integrations

See the integrations quick start guides to get started:

This integration is used to collect Hadoop metrics as follows:

Application Metrics
Cluster Metrics
DataNode Metrics
NameNode Metrics
NodeManager Metrics

This integration uses Resource Manager API and JMX API to collect above metrics.

application

This data stream collects Application metrics.

An example event for application looks as following:

{
    "@timestamp": "2023-02-02T12:03:41.178Z",
    "agent": {
        "ephemeral_id": "71297f22-c3ed-49b3-a8a9-a9d2086f8df2",
        "id": "2d054344-10a6-40d9-90c1-ea017fecfda3",
        "name": "docker-fleet-agent",
        "type": "filebeat",
        "version": "8.5.1"
    },
    "data_stream": {
        "dataset": "hadoop.application",
        "namespace": "ep",
        "type": "logs"
    },
    "ecs": {
        "version": "8.5.1"
    },
    "elastic_agent": {
        "id": "2d054344-10a6-40d9-90c1-ea017fecfda3",
        "snapshot": false,
        "version": "8.5.1"
    },
    "event": {
        "agent_id_status": "verified",
        "category": "database",
        "created": "2023-02-02T12:03:41.178Z",
        "dataset": "hadoop.application",
        "ingested": "2023-02-02T12:03:42Z",
        "kind": "metric",
        "module": "httpjson",
        "type": "info"
    },
    "hadoop": {
        "application": {
            "allocated": {
                "mb": 2048,
                "v_cores": 1
            },
            "id": "application_1675339401983_0001",
            "memory_seconds": 12185,
            "progress": 0,
            "running_containers": 1,
            "time": {
                "elapsed": 7453,
                "finished": "1970-01-01T00:00:00.000Z",
                "started": "2023-02-02T12:03:33.662Z"
            },
            "vcore_seconds": 5
        }
    },
    "input": {
        "type": "httpjson"
    },
    "tags": [
        "forwarded"
    ]
}

Exported fields

Field	Description	Type
@timestamp	Event timestamp.	date
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
event.category	This is one of four ECS Categorization Fields, and indicates the second level in the ECS category hierarchy. `event.category` represents the "big buckets" of ECS categories. For example, filtering on `event.category:process` yields all events relating to process activity. This field is closely related to `event.type`, which is used as a subcategory. This field is an array. This will allow proper categorization of some events that fall in multiple categories.	keyword
event.dataset	Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.	keyword
event.kind	This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not.	keyword
event.module	Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.	keyword
event.type	This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types.	keyword
hadoop.application.allocated.mb	Total memory allocated to the application's running containers (Mb)	long
hadoop.application.allocated.v_cores	The total number of virtual cores allocated to the application's running containers	long
hadoop.application.id	Application ID	keyword
hadoop.application.memory_seconds	The amount of memory the application has allocated	long
hadoop.application.progress	Application progress (%)	long
hadoop.application.running_containers	Number of containers currently running for the application	long
hadoop.application.time.elapsed	The elapsed time since application started (ms)	long
hadoop.application.time.finished	Application finished time	date
hadoop.application.time.started	Application start time	date
hadoop.application.vcore_seconds	The amount of CPU resources the application has allocated	long
input.type	Type of Filebeat input.	keyword
tags	User defined tags	keyword

cluster

This data stream collects Cluster metrics.

An example event for cluster looks as following:

{
    "@timestamp": "2022-04-04T17:22:22.255Z",
    "agent": {
        "ephemeral_id": "a8157f06-f6b6-4eae-b67f-4ad08fa7c170",
        "id": "abf8f8c1-f293-4e16-a8f8-8cf48014d040",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.1.0"
    },
    "data_stream": {
        "dataset": "hadoop.cluster",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.5.1"
    },
    "elastic_agent": {
        "id": "abf8f8c1-f293-4e16-a8f8-8cf48014d040",
        "snapshot": false,
        "version": "8.1.0"
    },
    "event": {
        "agent_id_status": "verified",
        "category": "database",
        "dataset": "hadoop.cluster",
        "duration": 50350559,
        "ingested": "2022-04-04T17:22:25Z",
        "kind": "metric",
        "module": "http",
        "type": "info"
    },
    "hadoop": {
        "cluster": {
            "application_main": {
                "launch_delay_avg_time": 2115,
                "launch_delay_num_ops": 1,
                "register_delay_avg_time": 0,
                "register_delay_num_ops": 0
            },
            "node_managers": {
                "num_active": 1,
                "num_decommissioned": 0,
                "num_lost": 0,
                "num_rebooted": 0,
                "num_unhealthy": 0
            }
        }
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "ip": [
            "172.27.0.7"
        ],
        "mac": [
            "02:42:ac:1b:00:07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.53.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.3 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "json",
        "period": 60000
    },
    "service": {
        "address": "http://elastic-package-service_hadoop_1:8088/jmx?qry=Hadoop%3Aservice%3DResourceManager%2Cname%3DClusterMetrics",
        "type": "http"
    },
    "tags": [
        "forwarded"
    ]
}

Exported fields

Field	Description	Type
@timestamp	Event timestamp.	date
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
event.category	This is one of four ECS Categorization Fields, and indicates the second level in the ECS category hierarchy. `event.category` represents the "big buckets" of ECS categories. For example, filtering on `event.category:process` yields all events relating to process activity. This field is closely related to `event.type`, which is used as a subcategory. This field is an array. This will allow proper categorization of some events that fall in multiple categories.	keyword
event.dataset	Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.	keyword
event.kind	This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not.	keyword
event.module	Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.	keyword
event.type	This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types.	keyword
hadoop.cluster.application_main.launch_delay_avg_time	Application Main Launch Delay Average Time (Milliseconds)	long
hadoop.cluster.application_main.launch_delay_num_ops	Application Main Launch Delay Operations (Number of Operations)	long
hadoop.cluster.application_main.register_delay_avg_time	Application Main Register Delay Average Time (Milliseconds)	long
hadoop.cluster.application_main.register_delay_num_ops	Application Main Register Delay Operations (Number of Operations)	long
hadoop.cluster.applications.completed	The number of applications completed	long
hadoop.cluster.applications.failed	The number of applications failed	long
hadoop.cluster.applications.killed	The number of applications killed	long
hadoop.cluster.applications.pending	The number of applications pending	long
hadoop.cluster.applications.running	The number of applications running	long
hadoop.cluster.applications.submitted	The number of applications submitted	long
hadoop.cluster.containers.allocated	The number of containers allocated	long
hadoop.cluster.containers.pending	The number of containers pending	long
hadoop.cluster.containers.reserved	The number of containers reserved	long
hadoop.cluster.memory.allocated	The amount of memory allocated in MB	long
hadoop.cluster.memory.available	The amount of memory available in MB	long
hadoop.cluster.memory.reserved	The amount of memory reserved in MB	long
hadoop.cluster.memory.total	The amount of total memory in MB	long
hadoop.cluster.node_managers.num_active	Number of Node Managers Active	long
hadoop.cluster.node_managers.num_decommissioned	Number of Node Managers Decommissioned	long
hadoop.cluster.node_managers.num_lost	Number of Node Managers Lost	long
hadoop.cluster.node_managers.num_rebooted	Number of Node Managers Rebooted	long
hadoop.cluster.node_managers.num_unhealthy	Number of Node Managers Unhealthy	long
hadoop.cluster.nodes.active	The number of active nodes	long
hadoop.cluster.nodes.decommissioned	The number of nodes decommissioned	long
hadoop.cluster.nodes.decommissioning	The number of nodes being decommissioned	long
hadoop.cluster.nodes.lost	The number of lost nodes	long
hadoop.cluster.nodes.rebooted	The number of nodes rebooted	long
hadoop.cluster.nodes.shutdown	The number of nodes shut down	long
hadoop.cluster.nodes.total	The total number of nodes	long
hadoop.cluster.nodes.unhealthy	The number of unhealthy nodes	long
hadoop.cluster.virtual_cores.allocated	The number of allocated virtual cores	long
hadoop.cluster.virtual_cores.available	The number of available virtual cores	long
hadoop.cluster.virtual_cores.reserved	The number of reserved virtual cores	long
hadoop.cluster.virtual_cores.total	The total number of virtual cores	long
service.address	Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets).	keyword
service.type	The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`.	keyword
tags	List of keywords used to tag each event.	keyword

datanode

This data stream collects Datanode metrics.

An example event for datanode looks as following:

{
    "@timestamp": "2023-02-02T12:05:04.266Z",
    "agent": {
        "ephemeral_id": "2f5c3354-b1d6-4f1b-b12e-2824ae65dd02",
        "id": "2d054344-10a6-40d9-90c1-ea017fecfda3",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.5.1"
    },
    "data_stream": {
        "dataset": "hadoop.datanode",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.5.1"
    },
    "elastic_agent": {
        "id": "2d054344-10a6-40d9-90c1-ea017fecfda3",
        "snapshot": false,
        "version": "8.5.1"
    },
    "event": {
        "agent_id_status": "verified",
        "category": "database",
        "dataset": "hadoop.datanode",
        "duration": 241651987,
        "ingested": "2023-02-02T12:05:05Z",
        "kind": "metric",
        "module": "http",
        "type": "info"
    },
    "hadoop": {
        "datanode": {
            "blocks": {
                "cached": 0,
                "failed": {
                    "to_cache": 0,
                    "to_uncache": 0
                }
            },
            "cache": {
                "capacity": 0,
                "used": 0
            },
            "dfs_used": 804585,
            "disk_space": {
                "capacity": 80637005824,
                "remaining": 56384421888
            },
            "estimated_capacity_lost_total": 0,
            "last_volume_failure_date": "1970-01-01T00:00:00.000Z",
            "volumes": {
                "failed": 0
            }
        }
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "id": "75e38940166b4dbc90b6f5610e8e9c39",
        "ip": [
            "172.28.0.5"
        ],
        "mac": [
            "02-42-AC-1C-00-05"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.80.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.5 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "json",
        "period": 60000
    },
    "service": {
        "address": "http://elastic-package-service_hadoop_1:9864/jmx?qry=Hadoop%3Aname%3DDataNodeActivity%2A%2Cservice%3DDataNode",
        "type": "http"
    },
    "tags": [
        "forwarded"
    ]
}

Exported fields

Field	Description	Type
@timestamp	Event timestamp.	date
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
event.category	This is one of four ECS Categorization Fields, and indicates the second level in the ECS category hierarchy. `event.category` represents the "big buckets" of ECS categories. For example, filtering on `event.category:process` yields all events relating to process activity. This field is closely related to `event.type`, which is used as a subcategory. This field is an array. This will allow proper categorization of some events that fall in multiple categories.	keyword
event.dataset	Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.	keyword
event.kind	This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not.	keyword
event.module	Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.	keyword
event.type	This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types.	keyword
hadoop.datanode.blocks.cached	The number of blocks cached	long
hadoop.datanode.blocks.failed.to_cache	The number of blocks that failed to cache	long
hadoop.datanode.blocks.failed.to_uncache	The number of failed blocks to remove from cache	long
hadoop.datanode.bytes.read	Data read	long
hadoop.datanode.bytes.written	Data written	long
hadoop.datanode.cache.capacity	Cache capacity in bytes	long
hadoop.datanode.cache.used	Cache used in bytes	long
hadoop.datanode.dfs_used	Distributed File System Used	long
hadoop.datanode.disk_space.capacity	Disk capacity in bytes	long
hadoop.datanode.disk_space.remaining	The remaining disk space left in bytes	long
hadoop.datanode.estimated_capacity_lost_total	The estimated capacity lost in bytes	long
hadoop.datanode.last_volume_failure_date	The date/time of the last volume failure in milliseconds since epoch	date
hadoop.datanode.volumes.failed	Number of failed volumes	long
service.address	Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets).	keyword
service.type	The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`.	keyword
tags	List of keywords used to tag each event.	keyword

namenode

This data stream collects Namenode metrics.

An example event for namenode looks as following:

{
    "@timestamp": "2022-03-28T11:36:07.166Z",
    "agent": {
        "ephemeral_id": "1304d85b-b3ba-45a5-ad59-c9ec7df3a49f",
        "id": "adf6847a-3726-4fe6-a202-147021ff3cbc",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.1.0"
    },
    "data_stream": {
        "dataset": "hadoop.namenode",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.5.1"
    },
    "elastic_agent": {
        "id": "adf6847a-3726-4fe6-a202-147021ff3cbc",
        "snapshot": false,
        "version": "8.1.0"
    },
    "event": {
        "agent_id_status": "verified",
        "category": "database",
        "dataset": "hadoop.namenode",
        "duration": 341259289,
        "ingested": "2022-03-28T11:36:09Z",
        "kind": "metric",
        "module": "http",
        "type": "info"
    },
    "hadoop": {
        "namenode": {
            "blocks": {
                "corrupt": 0,
                "missing_repl_one": 0,
                "pending_deletion": 0,
                "pending_replication": 0,
                "scheduled_replication": 0,
                "total": 0,
                "under_replicated": 0
            },
            "capacity": {
                "remaining": 11986817024,
                "total": 48420556800,
                "used": 4096
            },
            "estimated_capacity_lost_total": 0,
            "files_total": 9,
            "lock_queue_length": 0,
            "nodes": {
                "num_dead_data": 0,
                "num_decom_dead_data": 0,
                "num_decom_live_data": 0,
                "num_decommissioning_data": 0,
                "num_live_data": 1
            },
            "num_stale_storages": 0,
            "stale_data_nodes": 0,
            "total_load": 0,
            "volume_failures_total": 0
        }
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "ip": [
            "192.168.160.7"
        ],
        "mac": [
            "02:42:c0:a8:a0:07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.45.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.3 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "json",
        "period": 60000
    },
    "service": {
        "address": "http://elastic-package-service_hadoop_1:9870/jmx?qry=Hadoop%3Aname%3DFSNamesystem%2Cservice%3DNameNode",
        "type": "http"
    },
    "tags": [
        "forwarded"
    ]
}

Exported fields

Field	Description	Type
@timestamp	Event timestamp.	date
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
event.category	This is one of four ECS Categorization Fields, and indicates the second level in the ECS category hierarchy. `event.category` represents the "big buckets" of ECS categories. For example, filtering on `event.category:process` yields all events relating to process activity. This field is closely related to `event.type`, which is used as a subcategory. This field is an array. This will allow proper categorization of some events that fall in multiple categories.	keyword
event.dataset	Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.	keyword
event.kind	This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not.	keyword
event.module	Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.	keyword
event.type	This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types.	keyword
hadoop.namenode.blocks.corrupt	Current number of blocks with corrupt replicas.	long
hadoop.namenode.blocks.missing_repl_one	Current number of missing blocks with replication factor 1	long
hadoop.namenode.blocks.pending_deletion	Current number of blocks pending deletion	long
hadoop.namenode.blocks.pending_replication	Current number of blocks pending to be replicated	long
hadoop.namenode.blocks.scheduled_replication	Current number of blocks scheduled for replications	long
hadoop.namenode.blocks.total	Current number of allocated blocks in the system	long
hadoop.namenode.blocks.under_replicated	Current number of blocks under replicated	long
hadoop.namenode.capacity.remaining	Current remaining capacity in bytes	long
hadoop.namenode.capacity.total	Current raw capacity of DataNodes in bytes	long
hadoop.namenode.capacity.used	Current used capacity across all DataNodes in bytes	long
hadoop.namenode.estimated_capacity_lost_total	An estimate of the total capacity lost due to volume failures	long
hadoop.namenode.files_total	Current number of files and directories	long
hadoop.namenode.lock_queue_length	Number of threads waiting to acquire FSNameSystem lock	long
hadoop.namenode.nodes.num_dead_data	Number of datanodes which are currently dead	long
hadoop.namenode.nodes.num_decom_dead_data	Number of datanodes which have been decommissioned and are now dead	long
hadoop.namenode.nodes.num_decom_live_data	Number of datanodes which have been decommissioned and are now live	long
hadoop.namenode.nodes.num_decommissioning_data	Number of datanodes in decommissioning state	long
hadoop.namenode.nodes.num_live_data	Number of datanodes which are currently live	long
hadoop.namenode.num_stale_storages	Number of storages marked as content stale	long
hadoop.namenode.stale_data_nodes	Current number of DataNodes marked stale due to delayed heartbeat	long
hadoop.namenode.total_load	Current number of connections	long
hadoop.namenode.volume_failures_total	Total number of volume failures across all Datanodes	long
service.address	Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets).	keyword
service.type	The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`.	keyword
tags	List of keywords used to tag each event.	keyword

node_manager

This data stream collects Node Manager metrics.

An example event for node_manager looks as following:

{
    "@timestamp": "2022-03-28T11:54:32.506Z",
    "agent": {
        "ephemeral_id": "9948a37a-5732-4d7d-9218-0e9cf30c035a",
        "id": "adf6847a-3726-4fe6-a202-147021ff3cbc",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.1.0"
    },
    "data_stream": {
        "dataset": "hadoop.node_manager",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.5.1"
    },
    "elastic_agent": {
        "id": "adf6847a-3726-4fe6-a202-147021ff3cbc",
        "snapshot": false,
        "version": "8.1.0"
    },
    "event": {
        "agent_id_status": "verified",
        "category": "database",
        "dataset": "hadoop.node_manager",
        "duration": 12930711,
        "ingested": "2022-03-28T11:54:35Z",
        "kind": "metric",
        "module": "http",
        "type": "info"
    },
    "hadoop": {
        "node_manager": {
            "allocated_containers": 0,
            "container_launch_duration_avg_time": 169,
            "container_launch_duration_num_ops": 2,
            "containers": {
                "completed": 0,
                "failed": 2,
                "initing": 0,
                "killed": 0,
                "launched": 2,
                "running": 0
            }
        }
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "ip": [
            "192.168.160.7"
        ],
        "mac": [
            "02:42:c0:a8:a0:07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.45.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.3 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "json",
        "period": 60000
    },
    "service": {
        "address": "http://elastic-package-service_hadoop_1:8042/jmx?qry=Hadoop%3Aservice%3DNodeManager%2Cname%3DNodeManagerMetrics",
        "type": "http"
    },
    "tags": [
        "forwarded"
    ]
}

Exported fields

Field	Description	Type
@timestamp	Event timestamp.	date
data_stream.dataset	Data stream dataset.	constant_keyword
data_stream.namespace	Data stream namespace.	constant_keyword
data_stream.type	Data stream type.	constant_keyword
ecs.version	ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.	keyword
event.category	This is one of four ECS Categorization Fields, and indicates the second level in the ECS category hierarchy. `event.category` represents the "big buckets" of ECS categories. For example, filtering on `event.category:process` yields all events relating to process activity. This field is closely related to `event.type`, which is used as a subcategory. This field is an array. This will allow proper categorization of some events that fall in multiple categories.	keyword
event.dataset	Name of the dataset. If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from. It's recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.	keyword
event.kind	This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not.	keyword
event.module	Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.	keyword
event.type	This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types.	keyword
hadoop.node_manager.allocated_containers	Containers Allocated	long
hadoop.node_manager.container_launch_duration_avg_time	Container Launch Duration Average Time (Seconds)	long
hadoop.node_manager.container_launch_duration_num_ops	Container Launch Duration Operations (Operations)	long
hadoop.node_manager.containers.completed	Containers Completed	long
hadoop.node_manager.containers.failed	Containers Failed	long
hadoop.node_manager.containers.initing	Containers Initializing	long
hadoop.node_manager.containers.killed	Containers Killed	long
hadoop.node_manager.containers.launched	Containers Launched	long
hadoop.node_manager.containers.running	Containers Running	long
service.address	Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets).	keyword
service.type	The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`.	keyword
tags	List of keywords used to tag each event.	keyword

Changelog

Version	Details
0.7.1	Bug fix View pull request Remove `(s)` from the Applications dashboard visualizations.
0.7.0	Enhancement View pull request Migrate `Cluster overview` dashboard visualizations to lens.
0.6.0	Enhancement View pull request Migrate `Applications` dashboard visualizations to lens.
0.5.0	Enhancement View pull request Migrate `Data nodes` and `Node Manager` dashboard visualizations to lens.
0.4.2	Enhancement View pull request Added categories and/or subcategories.
0.4.1	Bug fix View pull request Replace format of date processor from epoch_millis to UNIX_MS
0.4.0	Enhancement View pull request Update ECS version to 8.5.1
0.3.0	Enhancement View pull request Added infrastructure category.
0.2.0	Enhancement View pull request Add dashboard and visualizations for Hadoop.
0.1.1	Enhancement View pull request Update SSL and proxy configuration for application data stream.
0.1.0	Enhancement View pull request Add namenode data stream for Hadoop. Enhancement View pull request Add node_manager data stream, visualizations and Dashboards for Hadoop. Enhancement View pull request Add datanode data stream for Hadoop. Enhancement View pull request Add cluster metrics data stream for Hadoop. Enhancement View pull request Add Hadoop integration with application data stream.