Skip to main content

Tableau

Module tableau

Incubating

Important Capabilities

CapabilityStatusNotes
Dataset UsageDashboard/Chart view counts, enabled using extract_usage_stats config
DescriptionsEnabled by default
Detect Deleted EntitiesEnabled by default when stateful ingestion is turned on.
DomainsRequires transformer
Extract OwnershipRequires recipe configuration
Extract TagsRequires recipe configuration
Platform InstanceEnabled by default
Table-Level LineageEnabled by default

Prerequisites

In order to ingest metadata from Tableau, you will need:

  • Tableau Server Version 2021.1.10 and above. It may also work for older versions.
  • Enable the Tableau Metadata API for Tableau Server, if its not already enabled.
  • Tableau Credentials (Username/Password or Personal Access Token)
  • The user or token must have Site Administrator Explorer permissions.

Ingestion through UI

The following video shows you how to get started with ingesting Tableau metadata through the UI.

Integration Details

This plugin extracts Sheets, Dashboards, Embedded and Published Data sources metadata within Workbooks in a given project on a Tableau site. Tableau's GraphQL interface is used to extract metadata information. Queries used to extract metadata are located in metadata-ingestion/src/datahub/ingestion/source/tableau_common.py

Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
"Tableau"Data Platform
ProjectContainerSubType "Project"
Embedded DataSourceDatasetSubType "Embedded Data Source"
Published DataSourceDatasetSubType "Published Data Source"
Custom SQL TableDatasetSubTypes "View", "Custom SQL"
Embedded or External TablesDataset
SheetChart
DashboardDashboard
UserUser (a.k.a CorpUser)Optionally Extracted
WorkbookContainerSubType "Workbook"
TagTagOptionally Extracted

Lineage

Lineage is emitted as received from Tableau's metadata API for

  • Sheets contained within a Dashboard
  • Embedded or Published Data Sources depended on by a Sheet
  • Published Data Sources upstream to Embedded datasource
  • Tables upstream to Embedded or Published Data Source
  • Custom SQL datasources upstream to Embedded or Published Data Source
  • Tables upstream to Custom SQL Data Source

Caveats

  • Tableau metadata API might return incorrect schema name for tables for some databases, leading to incorrect metadata in DataHub. This source attempts to extract correct schema from databaseTable's fully qualified name, wherever possible. Read Using the databaseTable object in query for caveats in using schema attribute.

Troubleshooting

Why are only some workbooks/custom SQLs/published datasources ingested from the specified project?

This may happen when the Tableau API returns NODE_LIMIT_EXCEEDED error in response to metadata query and returns partial results with message "Showing partial results. , The request exceeded the ‘n’ node limit. Use pagination, additional filtering, or both in the query to adjust results." To resolve this, consider

  • reducing the page size using the page_size config param in datahub recipe (Defaults to 10).
  • increasing tableau configuration metadata query node limit to higher value.

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[tableau]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: tableau
config:
# Coordinates
connect_uri: https://prod-ca-a.online.tableau.com
site: acryl
platform_instance: acryl_instance
project_pattern: ["^default$", "^Project 2$", "^/Project A/Nested Project B$"]

# Credentials
username: "${TABLEAU_USER}"
password: "${TABLEAU_PASSWORD}"

# Options
ingest_tags: True
ingest_owner: True
default_schema_map:
mydatabase: public
anotherdatabase: anotherschema

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
Field [Required]TypeDescriptionDefaultNotes
connect_uri [✅]stringTableau host URL.None
default_schema_mapobjectDefault schema to use when schema is not found.None
extract_column_level_lineagebooleanWhen enabled, extracts column-level lineage from Tableau DatasourcesTrue
extract_project_hierarchybooleanWhether to extract entire project hierarchy for nested projects.True
extract_usage_statsboolean[experimental] Extract usage statistics for dashboards and charts.None
ingest_embed_urlbooleanIngest a URL to render an embedded Preview of assets within Tableau.True
ingest_ownerbooleanIngest Owner from source. This will override Owner info entered from UINone
ingest_tables_externalbooleanIngest details for tables external to (not embedded in) tableau as entities.None
ingest_tagsbooleanIngest Tags from source. This will override Tags entered from UINone
page_sizeinteger[advanced] Number of metadata objects (e.g. CustomSQLTable, PublishedDatasource, etc) to query at a time using the Tableau API.10
passwordstringTableau password, must be set if authenticating using username/password.None
platform_instancestringThe instance of the platform that all assets produced by this recipe belong toNone
platform_instance_mapmap(str,string)None
project_path_separatorstringThe separator used for the project_pattern field between project names. By default, we use a slash. You can change this if your Tableau projects contain slashes in their names, and you'd like to filter by project./
projectsarray(string)None
sitestringTableau Site. Always required for Tableau Online. Use emptystring to connect with Default site on Tableau Server.None
ssl_verifyUnionType (See notes for variants)Whether to verify SSL certificates. If using self-signed certificates, set to false or provide the path to the .pem certificate bundle.TrueOne of boolean,string
token_namestringTableau token name, must be set if authenticating using a personal access token.None
token_valuestringTableau token value, must be set if authenticating using a personal access token.None
usernamestringTableau username, must be set if authenticating using username/password.None
workbook_page_sizeinteger[advanced] Number of workbooks to query at a time using the Tableau API.1
envstringEnvironment to use in namespace when constructing URNs.PROD
lineage_overridesTableauLineageOverridesMappings to change generated dataset urns. Use only if you really know what you are doing.None
lineage_overrides.platform_override_mapmap(str,string)None
project_patternAllowDenyPatternFilter for specific Tableau projects. For example, use 'My Project' to ingest a root-level Project with name 'My Project', or 'My Project/Nested Project' to ingest a nested Project with name 'Nested Project'. By default, all Projects nested inside a matching Project will be included in ingestion. You can both allow and deny projects based on their name using their name, or a Regex pattern. Deny patterns always take precedence over allow patterns. By default, all projects will be ingested.{'allow': ['.*'], 'deny': [], 'ignoreCase': True}
project_pattern.allowarray(string)None
project_pattern.denyarray(string)None
project_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
stateful_ingestionStatefulStaleMetadataRemovalConfigBase specialized config for Stateful Ingestion with stale metadata removal capability.None
stateful_ingestion.enabledbooleanThe type of the ingestion state provider registered with datahub.None
stateful_ingestion.ignore_new_statebooleanIf set to True, ignores the current checkpoint state.None
stateful_ingestion.ignore_old_statebooleanIf set to True, ignores the previous checkpoint state.None
stateful_ingestion.remove_stale_metadatabooleanSoft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.True

Code Coordinates

  • Class Name: datahub.ingestion.source.tableau.TableauSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Tableau, feel free to ping us on our Slack