How to archive several thousands of your Power BI reports in a day? 
November 15, 2023
Over time, user workspaces become cluttered and inefficient. This is a common issue encountered by medium and large organizations utilizing the Power BI Service. Frequently, users create reports that remain unused, others become obsolete, and some are only generated for testing purposes but are not subsequently removed. This surplus of data and unnecessary reports can lead to chaos and hinder effective management of the Power BI platform within the organization. Therefore, it is crucial to take actions aimed at organizing and optimizing workspaces in Power BI. This is the focus of the project I have been working on in recent days. I invite you to read further. 

Michał Jełowicki

Introduction 

Over time, user workspaces become cluttered and inefficient. This is a common issue encountered by medium and large organizations utilizing the Power BI Service. Frequently, users create reports that remain unused, others become obsolete, and some are only generated for testing purposes but are not subsequently removed. This surplus of data and unnecessary reports can lead to chaos and hinder effective management of the Power BI platform within the organization. Therefore, it is crucial to take actions aimed at organizing, ‘cleaning up,’ and getting rid of unused reports, as well as archiving workspaces in Power BI. This is the focus of the project I have been working on in recent days. I invite you to read further.

Objective

The project aimed to remove rarely used reports from the Power BI Portal, archive them on Azure Blob Storage, and create a table with information about which users have access to the archived reports. 

Description

The project is divided into three parts:

  1. Granting access to Workspaces in the Power BI Portal. This step is divided into two scripts: a PowerShell script that grants access to classic workspaces (those that can be shared and edited with other users in the organization) and a Python script that grants access to Personal Workspaces. PowerShell relies on cmdlets, while Python uses web scraping. As of the project’s creation in September 2023, it is not possible to grant access to Personal Workspaces using cmdlets or Power BI API.
  2. Archiving reports. This part is managed by a PowerShell script. Reports are downloaded to the local disk, then uploaded to Blob Storage, and finally, the report’s dataset is deleted (via an API query), which means the removal of the report, dataset, and dashboard. The list of reports to be removed is provided through an XLSX file (an example file is included in the solution).
  3. Creating CSV files containing lists of users and the reports assigned to them. The script takes into account access obtained through Security Groups and applications. This step is managed by the scripts GetAppUsers.ps1 and PowerBIArchiveReports.ps1. The lists are exported to two CSV files: reportUsers.csv – users with access to reports from workspaces, reports, or Security Groups, and appUsers.csv – users with access to reports through applications. 

Requirements 

  • Administrator account on Power BI Portal. 
  • App Registration  with admin Power BI API access (authorization using client_credentials). 
  • App Registration with Power BI API Delete access (authorization using authorization_code).
  • Google profile with access to Power BI Portal (optional, provided that the login script is modified to work with Multi-Factor Authentication). 
  • Blob Storage and a container for archived reports. 
  • ChromeDriver version matching your Chrome browser version (usually the latest version). 

Tools 

  1. PowerShell. 
  2. Python. 
  3. Power BI API. 
  4. Azure Blob Storage. 

Introduction

Let’s begin by considering how to approach the problem. What tools do we have at our disposal, and which ones should we prioritize using initially, and which as a last resort? We certainly need a way to communicate with Power BI Service – we need to find data about reports to delete, assign permissions to workspaces, archive reports, and then remove them from the Portal. The first tools that come to mind are Power BI API and PowerShell cmdlets. It’s worth noting that to execute the described script, we will need very powerful permissions – we require read-write access to all workspaces in the organization, including Personal workspaces. 

API or cmdlets are always the preferred methods, as they are significantly faster! 

1. Granting Access to Personal Workspaces 

Unfortunately, as of the current date in September 2023, neither Power BI API nor cmdlets provide the capability to grant access to Personal Workspaces using these tools. Therefore, we have to implement a workaround – access will be granted “manually” through a web scraping script in Python. This means that the program will navigate to the Admin Portal page in Power BI and automate the process of granting access to thousands of workspaces (Power BI Portal -> Settings -> Admin Portal -> Workspaces -> select a workspace -> Get Access). 

from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver

# chromedriver settings
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")

# parameters
# example path to chrome data
options.add_argument(r"--user-data-dir=C:\Users\UserName\AppData\Local\Google\Chrome\User Data")
# directory with access to PowerBI Admin Portal (thanks to that we can skip MFA)
options.add_argument(r'--profile-directory=Profile 1')
# path to chromedriver
service = Service(r"C:\Users\folders\chromedriver.exe")

driver = webdriver.Chrome(service=service, options=options)
url = "https://app.powerbi.com/admin-portal/workspaces?experience=power-bi"
driver.get(url)

checkbox_index = 2
# counter of workspaces on the page
element_number = 0
counter = 0
while True:
    checkbox_id = f"mat-checkbox-{checkbox_index}"
    # checkbox
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, checkbox_id))).click()
    counter += 1
    print(counter)
    try:
        # get access
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button.mat-button span.mat-button-wrapper span[localize="Get_Access"]'))).click()
        checkbox_index += 11
    except:
        print(f"Cannot click checkbox with ID: {checkbox_id}")
        # unclick checkbox
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, checkbox_id))).click()
        checkbox_index += 1

    element_number += 1
    # go to next page
    if  element_number == 10:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button.mat-paginator-navigation-next[aria-label="Next page"]'))).click()
        element_number = 0

2. Granting Access to Regular Workspaces 

In this case, the process is much simpler – there is a PowerShell cmdlet designed specifically for this purpose. Before using it, we need to obtain a list of workspace IDs to which we want to grant access – this step can be accomplished through a Power BI API query. 

Import-Module -Name Az.Storage
Import-Module -Name MicrosoftPowerBIMgmt
Import-Module -Name ImportExcel


# PowerBI API parameters
$tenantId = ""
$clientId = ""
$clientSecret = "”
$resource = "https://analysis.windows.net/powerbi/api"
$tokenEndpoint = "https://login.microsoftonline.com/$tenantId/oauth2/token"
$adminAccount = "123abc@email.com"

$body = @{
    grant_type = "client_credentials"
    client_id = $clientId
    client_secret = $clientSecret
    resource = $resource
}

# generate access token
$accessTokenResponse = Invoke-RestMethod -Uri $tokenEndpoint -Method Post -Body $body
$accessToken = $accessTokenResponse.access_token

# list all workspaces in the organization
$workspaces = Invoke-RestMethod -Uri 'https://api.powerbi.com/v1.0/myorg/admin/groups?$top=10000' -Method Get -Headers @{
    Authorization = "Bearer $accessToken"
}

# connect to PowerBI Service
Connect-PowerBIServiceAccount
# iterate through every workspace
$workspaces.value | ForEach-Object {
$workspaceType = $_.type
$workspace = $_.id
$workspace
if ($workspaceType -eq "Workspace") {
# grant contributor access to the workspaces
Add-PowerBIWorkspaceUser -Scope Organization -Id $workspace -UserEmailAddress $adminAccount -AccessRight Contributor
}

3. Preparing a List of Reports to Archive 

There are multiple ways to obtain a list of unused reports. In Astral Forest, we have implemented the Power BI Monitoring tool, which allows us to monitor and identify unused reports in real-time. In this case, exporting the data from the tool to a CSV file should be sufficient. 

In case your organization doesn’t have access to the mentioned tool, you can try using alternative, less efficient methods to generate the list. An example of such a solution could be query in Power BI API. However, this solution has limitations – for instance, it only returns unused artifacts from the last 30 days and may also include other objects in the Power BI Portal, I recommend having at least 180 days of usage history of the reports.

4. Archiving Reports and Creating a User-Report Assignment List

Once the necessary permissions have been obtained, we can proceed to the main part of the script. Additionally, we will create a list of users along with the reports assigned to them. This way, if someone wants to retrieve their report after running the script, we will know whether they had permissions for it or not. If you would like to learn more about authorization code or if you encounter issues executing the script, I recommend reading this article

Import-Module -Name Az.Storage
Import-Module -Name MicrosoftPowerBIMgmt
Import-Module -Name ImportExcel

# PowerBI Admin parameters 
$tenantId = ""
$clientId = ""
$clientSecret = ""
$resource = "https://analysis.windows.net/powerbi/api"
$tokenEndpoint = "https://login.microsoftonline.com/$tenantId/oauth2/token"

# PowerBI API parameters
# link for authorization code (past in your browers and copy the part after code=)
# rememer to fill your client_Id!!!
# https://login.microsoftonline.com/common/oauth2/authorize?client_id=xxxxxxxxxxxxxxx8&response_type=code&redirect_uri=https://localhost/redirect/&response_mode=query&scope=openid&state=12345
$code = ''
$clientIdDelete = ""
$clientSecretDelete = ""


# rest of parameters
# path to folder where reports are saved
$baseReportPath = ""
# path to csv with list of reports ids
$filePath = ""
# path to xlsx with reports users
$csvPath = ""


# Azure parameters
$storageAccountName = ""
$storageContainerName = ""
$blobPath = ""
$storageAccountKey = ""
$storageContext = New-AzStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey


$bodyAdmin = @{
    grant_type = "client_credentials"
    client_id = $clientId
    client_secret = $clientSecret
    resource = $resource
}

# generate access token for read-only admin API
$accessTokenResponse = Invoke-RestMethod -Uri $tokenEndpoint -Method Post -Body $bodyAdmin
$accessToken = $accessTokenResponse.access_token



$bodyDelete = @{
    grant_type = "authorization_code"
    client_id = "$clientIdDelete"
    client_secret = "$clientSecretDelete"
    resource = $resource   
    code = "$code"
    redirect_uri = 'https://localhost/redirect/'
}

$accessTokenResponse = Invoke-RestMethod -Uri $tokenEndpoint -Method Post -Body $bodyDelete
$accessTokenDelete = $accessTokenResponse.access_token


$reports = Invoke-RestMethod -Uri "https://api.powerbi.com/v1.0/myorg/admin/reports" -Method Get -Headers @{
    Authorization = "Bearer $accessToken"
}


# get datasetId, reportName and workspaceId
$workspaceMap = @{}
$nameMap = @{}
$datasetMap = @{}

$reports.value | ForEach-Object {
    $reportId = $_.id
    $workspaceId = $_.workspaceId
    $name = $_.name
    $datasetId = $_.datasetId
    $workspaceMap[$reportId] = $workspaceId
    $nameMap[$reportId] = $name
    $datasetMap[$reportId] = $datasetId
}


# read xlsx file
$reportsExcel = Import-Excel -Path $filePath -StartRow 2 -NoHeader
$reportsArray = $reportsExcel | ForEach-Object { $_.PSObject.Properties.Value }

$csvData = @()


Connect-PowerBIServiceAccount
foreach ($reportId in $reportsArray) {
    $name = $nameMap["$reportId"]
    $workspaceId = $workspaceMap["$reportId"] 
    $datasetId = $datasetMap["$reportId"]
    $extension = ".pbix"
    
    # Find users which have access to report
    $users = Invoke-RestMethod -Uri "https://api.powerbi.com/v1.0/myorg/admin/reports/$reportId/users" -Method Get -Headers @{
    Authorization = "Bearer $accessToken"
    }
    
    # for each report get all users who have access to it
    $users.value | ForEach-Object {
        $user = $_.identifier
        $userRole = $_.reportUserAccessRight
        $userName = $_.displayName
        $principalType = $_.principalType
        $mail = $_.identifier

        $rowData = [PSCustomObject]@{
                "ReportName" = $nameMap["$reportId"]
                "ReportID" = $reportId
                "UserName" = $userName
                "UserEmail" = $user
                "UserRole" = $userRole
                }
        $csvData += $rowData

        # if the user is a group, write all the group members and assign them to the report
        if ($principalType -eq "Group") {

            $members = Invoke-RestMethod -Uri "https://graph.microsoft.com/v1.0/groups/$user/members" -Method Get -Headers @{
            Authorization = "Bearer $accessToken"
                }

            $members.value | ForEach-Object {
                $groupUserName = $_.displayName
                $groupUserMail = $_.userPrincipalName

                $rowData = @{
                "ReportName" = $nameMap["$reportId"]
                "ReportID" = $reportId
                "UserName" = $groupUserName
                "UserEmail" = $groupUserMail
                "UserRole" = $userRole
                }

                $csvData += New-Object PSObject -Property $rowData

                }
            }
        }
    

    # check wheter the name of the report already exists
    $reportPath = Join-Path -Path $baseReportPath -ChildPath "$name$extension"
    $index = 0
    
    while (Test-Path $reportPath) {
        $index++
        $newName = "${name}_$index"
        $reportPath = Join-Path -Path $baseReportPath -ChildPath "$newName$extension"
    }
    
    # download the report
    Export-PowerBIReport -Id $reportId -OutFile $reportPath

    # add number to the name if report exists 
    if ($index -eq 0) {
        $newName = $name
    } 
    else {
        $newName = "${name}_$index"
    }

    # upload report to the blob storage
    Set-AzStorageBlobContent -Context $storageContext -Container $storageContainerName -Blob $blobPath/$newName -File $reportPath -Force -StandardBlobTier Cool
    

    # remove dataset and report
    Invoke-RestMethod -Uri "https://api.powerbi.com/v1.0/myorg/groups/$workspaceId/datasets/$datasetId" -Method Delete -Headers @{
    Authorization = "Bearer $accessTokenDelete"
    }
}

# save the user list
$csvData | Export-Csv -Path $csvPath -NoTypeInformation -Encoding UTF8

Summary

We’ve successfully navigated through all the steps. The process is quite intricate, so even with the above assistance, implementing the solution may take some time. However, this marks the initial stride towards a well-organized Power BI environment and lays the foundation for further efforts in enhancing Data Governance within your organization. Best of luck! 

0 Comments