Tableau
Module tableau
Important Capabilities
Capability | Status | Notes |
---|---|---|
Dataset Usage | ✅ | Dashboard/Chart view counts, enabled using extract_usage_stats config |
Descriptions | ✅ | Enabled by default |
Detect Deleted Entities | ✅ | Enabled by default when stateful ingestion is turned on. |
Domains | ❌ | Requires transformer |
Extract Ownership | ✅ | Requires recipe configuration |
Extract Tags | ✅ | Requires recipe configuration |
Platform Instance | ✅ | Enabled by default |
Table-Level Lineage | ✅ | Enabled by default |
Prerequisites
In order to ingest metadata from Tableau, you will need:
- Tableau Server Version 2021.1.10 and above. It may also work for older versions.
- Enable the Tableau Metadata API for Tableau Server, if its not already enabled.
- Tableau Credentials (Username/Password or Personal Access Token)
- The user or token must have Site Administrator Explorer permissions.
Ingestion through UI
The following video shows you how to get started with ingesting Tableau metadata through the UI.
Integration Details
This plugin extracts Sheets, Dashboards, Embedded and Published Data sources metadata within Workbooks in a given project
on a Tableau site. Tableau's GraphQL interface is used to extract metadata information. Queries used to extract metadata are located
in metadata-ingestion/src/datahub/ingestion/source/tableau_common.py
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
"Tableau" | Data Platform | |
Project | Container | SubType "Project" |
Embedded DataSource | Dataset | SubType "Embedded Data Source" |
Published DataSource | Dataset | SubType "Published Data Source" |
Custom SQL Table | Dataset | SubTypes "View" , "Custom SQL" |
Embedded or External Tables | Dataset | |
Sheet | Chart | |
Dashboard | Dashboard | |
User | User (a.k.a CorpUser) | Optionally Extracted |
Workbook | Container | SubType "Workbook" |
Tag | Tag | Optionally Extracted |
Lineage
Lineage is emitted as received from Tableau's metadata API for
- Sheets contained within a Dashboard
- Embedded or Published Data Sources depended on by a Sheet
- Published Data Sources upstream to Embedded datasource
- Tables upstream to Embedded or Published Data Source
- Custom SQL datasources upstream to Embedded or Published Data Source
- Tables upstream to Custom SQL Data Source
Caveats
- Tableau metadata API might return incorrect schema name for tables for some databases, leading to incorrect metadata in DataHub. This source attempts to extract correct schema from databaseTable's fully qualified name, wherever possible. Read Using the databaseTable object in query for caveats in using schema attribute.
Troubleshooting
Why are only some workbooks/custom SQLs/published datasources ingested from the specified project?
This may happen when the Tableau API returns NODE_LIMIT_EXCEEDED error in response to metadata query and returns partial results with message "Showing partial results. , The request exceeded the ‘n’ node limit. Use pagination, additional filtering, or both in the query to adjust results." To resolve this, consider
- reducing the page size using the
page_size
config param in datahub recipe (Defaults to 10). - increasing tableau configuration metadata query node limit to higher value.
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[tableau]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: tableau
config:
# Coordinates
connect_uri: https://prod-ca-a.online.tableau.com
site: acryl
platform_instance: acryl_instance
project_pattern: ["^default$", "^Project 2$", "^/Project A/Nested Project B$"]
# Credentials
username: "${TABLEAU_USER}"
password: "${TABLEAU_PASSWORD}"
# Options
ingest_tags: True
ingest_owner: True
default_schema_map:
mydatabase: public
anotherdatabase: anotherschema
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
View All Configuration Options
Field [Required] | Type | Description | Default | Notes |
---|---|---|---|---|
connect_uri [✅] | string | Tableau host URL. | None | |
default_schema_map | object | Default schema to use when schema is not found. | None | |
extract_column_level_lineage | boolean | When enabled, extracts column-level lineage from Tableau Datasources | True | |
extract_project_hierarchy | boolean | Whether to extract entire project hierarchy for nested projects. | True | |
extract_usage_stats | boolean | [experimental] Extract usage statistics for dashboards and charts. | None | |
ingest_embed_url | boolean | Ingest a URL to render an embedded Preview of assets within Tableau. | True | |
ingest_owner | boolean | Ingest Owner from source. This will override Owner info entered from UI | None | |
ingest_tables_external | boolean | Ingest details for tables external to (not embedded in) tableau as entities. | None | |
ingest_tags | boolean | Ingest Tags from source. This will override Tags entered from UI | None | |
page_size | integer | [advanced] Number of metadata objects (e.g. CustomSQLTable, PublishedDatasource, etc) to query at a time using the Tableau API. | 10 | |
password | string | Tableau password, must be set if authenticating using username/password. | None | |
platform_instance | string | The instance of the platform that all assets produced by this recipe belong to | None | |
platform_instance_map | map(str,string) | None | ||
project_path_separator | string | The separator used for the project_pattern field between project names. By default, we use a slash. You can change this if your Tableau projects contain slashes in their names, and you'd like to filter by project. | / | |
projects | array(string) | None | ||
site | string | Tableau Site. Always required for Tableau Online. Use emptystring to connect with Default site on Tableau Server. | None | |
ssl_verify | UnionType (See notes for variants) | Whether to verify SSL certificates. If using self-signed certificates, set to false or provide the path to the .pem certificate bundle. | True | One of boolean,string |
token_name | string | Tableau token name, must be set if authenticating using a personal access token. | None | |
token_value | string | Tableau token value, must be set if authenticating using a personal access token. | None | |
username | string | Tableau username, must be set if authenticating using username/password. | None | |
workbook_page_size | integer | [advanced] Number of workbooks to query at a time using the Tableau API. | 1 | |
env | string | Environment to use in namespace when constructing URNs. | PROD | |
lineage_overrides | TableauLineageOverrides | Mappings to change generated dataset urns. Use only if you really know what you are doing. | None | |
lineage_overrides.platform_override_map | map(str,string) | None | ||
project_pattern | AllowDenyPattern | Filter for specific Tableau projects. For example, use 'My Project' to ingest a root-level Project with name 'My Project', or 'My Project/Nested Project' to ingest a nested Project with name 'Nested Project'. By default, all Projects nested inside a matching Project will be included in ingestion. You can both allow and deny projects based on their name using their name, or a Regex pattern. Deny patterns always take precedence over allow patterns. By default, all projects will be ingested. | {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | |
project_pattern.allow | array(string) | None | ||
project_pattern.deny | array(string) | None | ||
project_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
stateful_ingestion | StatefulStaleMetadataRemovalConfig | Base specialized config for Stateful Ingestion with stale metadata removal capability. | None | |
stateful_ingestion.enabled | boolean | The type of the ingestion state provider registered with datahub. | None | |
stateful_ingestion.ignore_new_state | boolean | If set to True, ignores the current checkpoint state. | None | |
stateful_ingestion.ignore_old_state | boolean | If set to True, ignores the previous checkpoint state. | None | |
stateful_ingestion.remove_stale_metadata | boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. | True |
The JSONSchema for this configuration is inlined below.
{
"title": "TableauConfig",
"description": "Any non-Dataset source that produces lineage to Datasets should inherit this class.\ne.g. Orchestrators, Pipelines, BI Tools etc.",
"type": "object",
"properties": {
"connect_uri": {
"title": "Connect Uri",
"description": "Tableau host URL.",
"type": "string"
},
"username": {
"title": "Username",
"description": "Tableau username, must be set if authenticating using username/password.",
"type": "string"
},
"password": {
"title": "Password",
"description": "Tableau password, must be set if authenticating using username/password.",
"type": "string"
},
"token_name": {
"title": "Token Name",
"description": "Tableau token name, must be set if authenticating using a personal access token.",
"type": "string"
},
"token_value": {
"title": "Token Value",
"description": "Tableau token value, must be set if authenticating using a personal access token.",
"type": "string"
},
"site": {
"title": "Site",
"description": "Tableau Site. Always required for Tableau Online. Use emptystring to connect with Default site on Tableau Server.",
"default": "",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to",
"type": "string"
},
"ssl_verify": {
"title": "Ssl Verify",
"description": "Whether to verify SSL certificates. If using self-signed certificates, set to false or provide the path to the .pem certificate bundle.",
"default": true,
"anyOf": [
{
"type": "boolean"
},
{
"type": "string"
}
]
},
"extract_column_level_lineage": {
"title": "Extract Column Level Lineage",
"description": "When enabled, extracts column-level lineage from Tableau Datasources",
"default": true,
"type": "boolean"
},
"env": {
"title": "Env",
"description": "Environment to use in namespace when constructing URNs.",
"default": "PROD",
"type": "string"
},
"stateful_ingestion": {
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
},
"platform_instance_map": {
"title": "Platform Instance Map",
"description": "A holder for platform -> platform_instance mappings to generate correct dataset urns",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"projects": {
"title": "Projects",
"description": "[deprecated] Use project_pattern instead. List of tableau projects ",
"default": [
"default"
],
"type": "array",
"items": {
"type": "string"
}
},
"project_pattern": {
"title": "Project Pattern",
"description": "Filter for specific Tableau projects. For example, use 'My Project' to ingest a root-level Project with name 'My Project', or 'My Project/Nested Project' to ingest a nested Project with name 'Nested Project'. By default, all Projects nested inside a matching Project will be included in ingestion. You can both allow and deny projects based on their name using their name, or a Regex pattern. Deny patterns always take precedence over allow patterns. By default, all projects will be ingested.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"project_path_separator": {
"title": "Project Path Separator",
"description": "The separator used for the project_pattern field between project names. By default, we use a slash. You can change this if your Tableau projects contain slashes in their names, and you'd like to filter by project.",
"default": "/",
"type": "string"
},
"default_schema_map": {
"title": "Default Schema Map",
"description": "Default schema to use when schema is not found.",
"default": {},
"type": "object"
},
"ingest_tags": {
"title": "Ingest Tags",
"description": "Ingest Tags from source. This will override Tags entered from UI",
"default": false,
"type": "boolean"
},
"ingest_owner": {
"title": "Ingest Owner",
"description": "Ingest Owner from source. This will override Owner info entered from UI",
"default": false,
"type": "boolean"
},
"ingest_tables_external": {
"title": "Ingest Tables External",
"description": "Ingest details for tables external to (not embedded in) tableau as entities.",
"default": false,
"type": "boolean"
},
"page_size": {
"title": "Page Size",
"description": "[advanced] Number of metadata objects (e.g. CustomSQLTable, PublishedDatasource, etc) to query at a time using the Tableau API.",
"default": 10,
"type": "integer"
},
"workbook_page_size": {
"title": "Workbook Page Size",
"description": "[advanced] Number of workbooks to query at a time using the Tableau API.",
"default": 1,
"type": "integer"
},
"lineage_overrides": {
"title": "Lineage Overrides",
"description": "Mappings to change generated dataset urns. Use only if you really know what you are doing.",
"allOf": [
{
"$ref": "#/definitions/TableauLineageOverrides"
}
]
},
"extract_usage_stats": {
"title": "Extract Usage Stats",
"description": "[experimental] Extract usage statistics for dashboards and charts.",
"default": false,
"type": "boolean"
},
"ingest_embed_url": {
"title": "Ingest Embed Url",
"description": "Ingest a URL to render an embedded Preview of assets within Tableau.",
"default": true,
"type": "boolean"
},
"extract_project_hierarchy": {
"title": "Extract Project Hierarchy",
"description": "Whether to extract entire project hierarchy for nested projects.",
"default": true,
"type": "boolean"
}
},
"required": [
"connect_uri"
],
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19)."
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "The type of the ingestion state provider registered with datahub.",
"default": false,
"type": "boolean"
},
"ignore_old_state": {
"title": "Ignore Old State",
"description": "If set to True, ignores the previous checkpoint state.",
"default": false,
"type": "boolean"
},
"ignore_new_state": {
"title": "Ignore New State",
"description": "If set to True, ignores the current checkpoint state.",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"TableauLineageOverrides": {
"title": "TableauLineageOverrides",
"type": "object",
"properties": {
"platform_override_map": {
"title": "Platform Override Map",
"description": "A holder for platform -> platform mappings to generate correct dataset urns",
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.tableau.TableauSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Tableau, feel free to ping us on our Slack