Managing the Spark SQL Connector
The Composer Spark SQL connector lets you access the data available in Spark SQL databases using the Composer client. The Composer Spark SQL connector supports Spark SQL versions 2.3 and 2.4.
Before you can establish a connection from Composer to Spark SQL storage, a connector server needs to be installed and configured. See Managing Connectors and Connector Servers for general instructions and Connecting to Spark SQL for details specific to the Spark SQL connector.
After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Creating Dashboards.
Composer Feature Support
The Spark SQL connector supports all Composer features, except for:
This connector supports pushdown joins for Fusion data sources.
To enable Kerberos authentication, see Connecting to Spark SQL Sources on a Kerberized HDP Cluster.
When establishing a connection to Spark SQL, you need to provide the following information when setting up the partition settings.
Configure the partition settings. For the partitioned fields you can select one of the following options:
- Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.
For the Configure column, numeric and time-based fields may be edited:
- Numeric types including Number and Integer - ability to select a default aggregation function
- Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied
Select fields for Distinct Counts as needed.
When you create a data source, the specific number of distinct values for the attribute fields are saved in Composer depending on the data sample from your data set. You can filter the data on your visual by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), select Refresh in the Statistics column.