Adding and connecting to data sources (Data Virtualization)

Data Virtualization supports many relational and non-relational data sources that you can add to your data source environment. Data Virtualization connects to relational data sources by using the Java Database Connectivity (JDBC) protocol.

There are several ways that you can connect to your data sources.

Before you begin

  1. If you want to enforce governance for your published objects, set up a governed catalog to publish your assets to. For more information, see Enabling enforcement of data protection rules in Data Virtualization.
  2. Review the list of data sources that are supported for Data Virtualization as a service. For more information, see Supported data sources (Data Virtualization as a service).
  3. For certain data sources, such as Google BigQuery, SAP HANA, and Snowflake, you must complete specific steps.
  4. After a data source is added, any user with virtualize permissions (Data Virtualization Manager or Engineer roles) can create virtual tables. The user can create virtual tables by using any of the added data sources, no matter which user added the data source. Users with a Data Virtualization User role must be granted access to the virtualized table or view by using the data request workflow. For more information, see Managing users and roles.
  5. Review considerations and restrictions for data type mapping in Data Virtualization. For more information, see Data source considerations (Data Virtualization).

Adding a new data source

To add a data source to your environment, complete the following steps.

Procedure

  1. Click Data > Data Virtualization > Data sources.
  2. Click Add connection > New connection to see a list of data sources that can be added to Data Virtualization.
  3. Select the type of data source that you want to connect to and then click Select.
    The type of connection that you create determines the information that you must provide.

    Typically, a connection requires a URL, a hostname, and port number.

  4. Specify the required information based on the connection that you selected:
    • The connection name and description.
    • The name of the database
    • The hostname or IP address and port number of the database, which is required to access the connection.
    • The username and password that allow access to the data source.
      Note: The username and password that is specified here refer to an ID with read-only access to the data source. This user is required for accessing data from the data source and does not necessarily correspond to a Cloud Pak for Data username or a Data Virtualization user ID.

      For some data sources, you can use the Cloud Pak for Data credentials to access the data source. To do so, select the corresponding checkbox.

    • Any additional properties required to create the connection.
  5. If you want to use SSL to connect to the database, copy the content of the SSL certificate and paste it in the corresponding box.
  6. Click Create to add the connection to the data source environment.

Adding a data source from an existing connection

To add a data source from an existing connection, complete the following steps.

Before you begin

You might see a blank page when you follow these steps. To work around the issue, visit the Data > Data Virtualization > Platform connections page before you use the following procedure.

Procedure

  1. Click Data > Data Virtualization > Data sources.
  2. Click the Add connection drop-down menu and click Existing connection to see a list of data sources that can be added to Data Virtualization.
  3. Select the data source that you want to add and click Add.

Connecting to data sources at the platform level

About this task

You can create connections that can be used by various services across the platform. Any user who has access to the platform can see these connections. However, only users with the credentials for the data source can use a connection. These platform-level connections are available from the Platform connections page.

Restriction: Not all services support the same types of connections. Most services support a subset of the connections that are supported by the platform.

The Platform connections page is a specialized view of the Platform assets catalog. (The connections that are defined on the Platform connections page are also included in the Platform assets catalog.)

The Platform connections page shows the list of connections that can be used by various services on the platform. At a minimum, all users have the Viewer role on the catalog, which means that they can see the connections that are defined.

Required permissions
To create a platform-level connection, you must be an Editor or Administrator on the Platform connections catalog.
Tip: Work with your data source administrator to ensure that you have the correct information to connect to your data source.

To create a platform-level connection, following these steps.

Procedure

  1. Click Data > Platform connections
  2. Click New connection.
  3. Select the type of data source that you want to connect to.

    If you want to connect to an unsupported data source by creating a Generic JDBC connection, a Cloud Pak for Data platform administrator must upload the JDBC drivers for that data source. For more information, see Importing JDBC drivers for data sources.

  4. Enter a name and description for the connection.
  5. Enter the details for the connection.
    The type of connection that you are creating determines the information that you must specify. Typically, a connection requires either:
    • A host name and port number
    • A URL

    You might also need to specify the database that you want to connect to.

  6. Enter your credentials for the connection. The type of connection that you are creating determines the format of the credentials. Typically, a connection requires one or more of the following:
    • Username and password

      If the data source is a service that is deployed on the same instance of Cloud Pak for Data where you are creating the connection, you can use your Cloud Pak for Data credentials to authenticate to the data source. When you select Use your Cloud Pak for Data Credentials to authenticate to the data source, the username and password fields are disabled.

    • API key
    • Secret key

    The credentials that you supply are accessible only from your account. Other users must supply their own credentials to use the connection.

    Some data sources allow you to connect anonymously.

  7. If applicable, specify the SSL information required to connect to your data source.

    Some data sources require you to use SSL for secure communication. Other data sources support it but do not require it. Ensure that you understand what information you need to provide to communicate securely with your data source:

    • If you specified a port number that is configured to accept SSL connections, ensure that you select The port is configured to accept SSL connections
    • If the data source uses a self-signed certificate, you must specify the contents of the certificate to enable secure communication between Cloud Pak for Data and the data source.
    • If your data source uses chained certificates, you can specify the contents of multiple certificates.