47. Spark
47.1. Spark SQL Version Support
The driver leverages Spark Thrift to enable bidirectional SQL access to SparkSQL data. Spark version 1.6 and above are supported.
 
47.2. Connection Options
47.2.1. Authentication
Property
Description
AuthScheme
The authentication scheme used. Accepted entries are Plain, LDAP, NOSASL, and Kerberos.
Server
The host name or IP address of the server hosting the SparkSQL database.
Port
The port for the SparkSQL database.
User
The username used to authenticate with SparkSQL.
Password
The password used to authenticate with SparkSQL.
Database
The name of the SparkSQL database.
ProtocolVersion
The Protocol Version used to authenticate with SparkSQL.
ImpersonationProxyUser
The proxy user of the Hive user impersonation.
SaslQop
Quality of protection for the SASL framework. The level of quality is negotiated between the client and server during authentication. Used by Kerberos authentication with TCP transport.
TransportMode
The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP.
47.2.2. Kerberos
Property
Description
KerberosKDC
The Kerberos Key Distribution Center (KDC) service used to authenticate the user.
KerberosRealm
The Kerberos Realm used to authenticate the user with.
KerberosSPN
The service principal name (SPN) for the Kerberos Domain Controller.
KerberosKeytabFile
The Keytab file containing your pairs of Kerberos principals and encrypted keys.
KerberosServiceRealm
The Kerberos realm of the service.
KerberosServiceKDC
The Kerberos KDC of the service.
KerberosTicketCache
The full file path to an MIT Kerberos credential cache file.
47.2.3. SSL
Property
Description
SSLClientCert
The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL).
SSLClientCertType
The type of key store containing the TLS/SSL client certificate.
SSLClientCertPassword
The password for the TLS/SSL client certificate.
SSLClientCertSubject
The subject of the TLS/SSL client certificate.
SSLServerCert
The certificate to be accepted from the server when connecting using TLS/SSL.
47.2.4. Firewall
Property
Description
FirewallType
The protocol used by a proxy-based firewall.
FirewallServer
The name or IP address of a proxy-based firewall.
FirewallPort
The TCP port for a proxy-based firewall.
FirewallUser
The user name to use to authenticate with a proxy-based firewall.
FirewallPassword
A password used to authenticate to a proxy-based firewall.
47.2.5. Proxy
Property
Description
ProxyAutoDetect
This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
ProxyServer
The hostname or IP address of a proxy to route HTTP traffic through.
ProxyPort
The TCP port the ProxyServer proxy is running on.
ProxyAuthScheme
The authentication type to use to authenticate to the ProxyServer proxy.
ProxyUser
A user name to be used to authenticate to the ProxyServer proxy.
ProxyPassword
A password to be used to authenticate to the ProxyServer proxy.
ProxySSLType
The SSL type to use when connecting to the ProxyServer proxy.
ProxyExceptions
A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .
47.2.6. Logging
Property
Description
Logfile
A filepath which designates the name and location of the log file.
Verbosity
The verbosity level that determines the amount of detail included in the log file.
LogModules
Core modules to be included in the log file.
MaxLogFileSize
A string specifying the maximum size in bytes for a log file (for example, 10 MB).
MaxLogFileCount
A string specifying the maximum file count of log files.
47.2.7. Schema
Property
Description
Location
A path to the directory that contains the schema files defining tables, views, and stored procedures.
BrowsableSchemas
This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
Tables
This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.
Views
Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.
47.2.8. Caching
Property
Description
AutoCache
Automatically caches the results of SELECT queries into a cache database specified by either CacheLocation or both of CacheConnection and CacheProvider .
CacheDriver
The database driver to be used to cache data.
CacheConnection
The connection string for the cache database. This property is always used in conjunction with CacheProvider . Setting both properties will override the value set for CacheLocation for caching data.
CacheLocation
Specifies the path to the cache when caching to a file.
CacheTolerance
The tolerance for stale data in the cache specified in seconds when using AutoCache .
Offline
Use offline mode to get the data from the cache instead of the live source.
CacheMetadata
This property determines whether or not to cache the table metadata to a file store.
47.2.9. Miscellaneous
Property
Description
AsyncQueryTimeout
The timeout for asynchronous requests issued by the provider to download large result sets.
BatchSize
The maximum size of each batch operation to submit.
ConnectionLifeTime
The maximum lifetime of a connection in seconds. Once the time has elapsed, the connection object is disposed.
ConnectOnOpen
This property species whether to connect to the Spark SQL when the connection is opened.
DescribeCommand
The describe command to determine which describe command will use to communicate with the Hive server. Accepted entries are DESCRIBE and DESC.
DetectView
Specifies whether to use DECRIBE FORMATTED ... to detect the specified table is view or not.
HTTPPath
The path component of the URL endpoint when using HTTP TransportMode.
MaxRows
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
Other
These hidden properties are used only in specific use cases.
PoolIdleTimeout
The allowed idle time for a connection before it is closed.
PoolMaxSize
The maximum connections in the pool.
PoolMinSize
The minimum number of connections in the pool.
PoolWaitTime
The max seconds to wait for an available connection.
PseudoColumns
This property indicates whether or not to include pseudo columns as columns to the table.
QueryPassthrough
This option passes the query to the Spark SQL server as is.
Readonly
You can use this property to enforce read-only access to Spark SQL from the provider.
RTK
The runtime key used for licensing.
ServerConfigurations
A name-value list of server configuration variables to override the server defaults.
SupportEnhancedSQL
This property enhances SQL functionality beyond what can be supported through the API directly, by enabling in-memory client-side processing.
Timeout
The value in seconds until the timeout error is thrown, canceling the operation.
UseConnectionPooling
This property enables connection pooling.
UseDatabricksUploadApi
This option specifies whether the Databricks Upload API will be use when executing batch insert.
UseInsertSelectSyntax
Specifies whether to use an INSERT INTO SELECT statement.
UseSSL
Specifies whether to use SSL Encryption when connecting to Hive.