Enable Data Sharpening for Cloudera Impala Data Sources
Data Sharpening works with certain partitioned Impala data sources. The partitioned field should be a time-based attribute and in a supported time format (for example, yyyy-MM-dd). Follow the steps below to set up Data Sharpening for an Impala data source.
To configure Data Sharpening for a Cloudera Impala data source configuration:
Log into Composer (either as an administrator or as a user who has been assigned to a group with data source management privileges).
Select Sources on the UI menu (
) or the top-level navigation menu, or select the Sources box on the Home page. The Sources page appears.
Select your Impala data source to start the data source configuration wizard.
Select the Fields tab. Locate the time field you will use as the driving time field for data sharpening. This is the time field that needs to be specified in the global default settings.
Identify the partition type that will be used, and change the setting in the Partitions column. Select a partition type from the drop-down list. An example is provided in Example that illustrates this step.
In the Configure column, select an appropriate time granularity, as shown below. Consider the 10% rule to ensure Data Sharpening runs when you want it to. See When Data Sharpening Occurs for more information.
An example is provided in Example that illustrates this step.
Select the Visuals tab.
Select the Global Default Settings. The time bar, search, and Data Sharpening settings for the data source appear.
Make sure Enable Timebar is switched on. This enables the Data Sharpening settings. If this is switched off, you cannot configure Data Sharpening settings. For more information about time bar default settings, see Configure Time Bar Defaults.
Turn the Prefer Sharpening switch on to enable Data Sharpening for the data source.
Optionally, use the Max Queries slider to specify the maximum number of queries used for Data Sharpening. The default maximum is 10 queries.
Select Finish or Save to save your data source configuration.
Example
The following scenario is used in this example of setting up Data Sharpening for a Cloudera Impala data source.
- You have 3 years of historical data on Cloudera Impala
- The time stamp in your data provides granularity to the day level (in column Order_Date)
- Your data is partitioned by month (using column Order_Date_Month that contains data from column Order_Date, but is truncated to the month)
To set up Data Sharpening for the Impala data source on its Fields tab:
Log into Composer (either as an administrator or as a user who has been assigned to a group with data source management privileges).
Select Sources on the UI menu (
) or the top-level navigation menu, or select the Sources box on the Home page. The Sources page appears.
Select the Impala data source to start the data source configuration wizard.
Select the Fields tab.
Determine whether there are sub-folders in Impala. If so, the Label must include the full date format (for example, month=202001, which is in time format yyyyMM).
Configure the Impala source as follows on the Fields tab:
For the Order_Date field, make sure Day granularity is selected.
For the Order_Date_Month field:
In the Partitions column, set the partitioned time field to Date (or verify that it is selected).
In the Default column, set the option to Pattern and enter the appropriate time format (for this example, the time format should be yyyyMM).
Select granularity to be Month (make sure time granularity of the partitioned column is more than the granularity of the linked time field).
Link the partition to the date field. Select a time field from the drop-down list (for this example, Order_Date should be selected).
Select the Visuals tab of the data source configuration and, in Global Default Settings, select Order_Date from the drop-down menu list. Order_Date is the driving time field.
Be sure you enable Data Sharpening by toggling Prefer Sharpening on. Optionally adjust the maximum number of queries for data Sharpening using the Max Queries slider. See Enable Data Sharpening and Configure Its Defaults for more information.
Select Finish or Save to save your data source configuration.
For sharpening to work in this example, the time range must be at least 10 times greater than the time interval for the selected visual. So if one month's data shows in the visual, and the time granularity is set to Day, Data Sharpening should run (most months have more than 30 days, which meets the 10% threshold).
If your Impala partition breaks out time attributes into separate fields, Data Sharpening is not immediately possible. For example, if YEAR, MONTH, and DAY are all separate partitioned fields, they must be combined into one field for Data Sharpening to function.
Comments
0 comments
Please sign in to leave a comment.