Manage the Impala Connector
The Composer Cloudera Impala™ connector allows you to visualize huge volumes of data stored in their Hadoop cluster in real time and with no ETL. Composer supports Impala versions 2.7 - 3.2.
Before you can establish a connection from Composer to Cloudera Impala storage, a connector server needs to be installed and configured. See Manage Connectors and Connector Servers for general instructions and Connect to Impala for details specific to the Cloudera Impala connector.
After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Create Dashboards.
This topic describes:
Cloudera Impala connector support for specific Composer features is shown in the following table.
Key: Y - Supported; N - Not Supported; N/A - not applicable
|Custom SQL Queries
|Derived Fields (Row-Level Expressions)
|Cloudera Impala connectors can receive only a single distinct count field in a query.
|Fast Distinct Values
|Group By Multiple Fields
|Group By Time
|Group By UNIX Time
|Histogram Floating Point Values
|Live Mode and Playback
|Pushdown Joins for Fusion Data Sources
|Wild Card Filters
|Wild Card Filters, Case-Insensitive Mode
|Wild Card Filters, Case-Sensitive Mode
The Cloudera Impala connector also supports Progress reporting. Progress reporting support allows the connector to report the progress of a running query. On the UI, this shows as Reading nn% in the upper left corner of a visual.
Support is provided for passing along credentials for users with access privileges to Impala source. Delegation allows for Impala queries to be issued with the privileges from a specified user. This is available in the Connection page and is set as the Do As User list. See Enable User Delegation and Apply User Delegation to a Connection.
When setting up an Impala connection, you need to provide the following.
Specify the JDBC URL. You can connect to your Impala data source using either simple user credentials authentication or Kerberos authentication with optional SSL encryption. Refer to Connecting to Impala on Kerberized CDH or Connecting to Impala with TLS (SSL) for more details on the configuration.
Composer enables you to connect either to a single Impala node or to multiple nodes within a cluster. To connect to a single Impala node, specify a JDBC URL in the following format:
To connect to multiple Impala nodes, specify the required JDBC URLs separated by commas. The URLs will be used in a round-robin fashion. Keep in mind that such a connection will be valid as long as there is at least one available node. If all the nodes can not be reached, then the connection won't be validated.
- If Impala authentication has been set up, provide a user name and password.
- To allow for Impala user delegation, select the appropriate custom user attribute from the Do As User drop-down list (set up by the Composer supervisor or administrator). This basically allows Composer to pass along credentials for the specified user with access rights to Impala. See Enable User Delegation and Apply User Delegation to a Connection.
- Select Validate. If successfully validated, the connection is saved.
Time-based fields can be configured for partitioning in an Impala data source configuration using the Partition column on the Fields tab of the data source configuration wizard. The following options are available:
No (partitioning to be done)
Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.
Function - If you select this option, the list of the partitioned columns and supported MURMUR3_HASH function will be displayed in the Configure column.
Numeric and time-based fields can be edited using the Configure column of the Fields tab:
- Numeric types including Number and Integer - ability to select a default aggregation function
- Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied.
Select the checkbox in the Distinct Count column for any fields if a distinct count is needed. For more information, see Work with Distinct Counts on Cloudera Impala.