I asked myself this question when I wanted to import cost analysis data from Azure. In Azure Cost Management there is an option to schedule a daily export with week-to-date or month-to-date cost analysis data. You can export the files to an Azure storage account.
When you do this, the export process will create a folder per year and month
(“[start date]-[end date]”). And because it’s a to-date export, every file will
contain the cost analysis data from all days in the month until the date of
export.
The problem here is that when you export all files, you will be importing
duplicate cost records for each day in the month. And you will also be importing
and transforming a lot of data that you don’t need, which can take up precious
time and system memory.
Files and folders
So the solution to this is to get the latest file from each folder, which
contains all the data for that month. I can you do this in Power BI?
Just to make it easy for this demo, I transferred all the files from the storage
account to a local folder on my computer.
Get data in Power BI
Let’s start with Power BI. Click Get Data, select Folder and browse to the root folder that contains all month folders. For me this is the folder C:\Demo. Click the OK button to continue.
(Note, if you want to do this using an Azure storage account, choose Azure Blob Storage, fill in the storage account name and select the correct container.)
You will see that the preview screen shows all the csv files that are in all subfolders in the root folder. Click the button Transform Data, because we need to add some steps to this transformation.
Latest file per folder
After Power BI finished loading the data into the Power Query editor, look at the data it has imported. If you look closely, you will see similar data that you will see in the File Explorer. File name, folder path, file extension and file dates. And also the Content column that contains the binary file content.
The first step is to transform this data to get the latest file date per folder. To do this, we are going to use grouping.
- In the Power BI menu ribbon click Transform > Group By
- Using Basic group by, select Folder Path as column
- For the new column name, type LastDateCreated
- Select the operation Max
- Select the column Date created
- Click the OK button
Now we have a table with the latest date created per folder. Rename this table to LastDatePerFolder.
Merge queries
Now repeat the step again the get all the files from the folders to create a new table. Rename this table to LastFilePerFolder.
- Select the LastFilePerFolder table
- In the menu ribbon, click the Merge Queries button
- From the current table, click the Date created column
- Select the table LastDatePerFolder and click the LastDateCreated column
- Select Right Outer as join kind, as we want only to get the files that match the latest date created
- Click the OK button to close the window and apply the merge
Now you see in the table preview we only have the files that we need, which are the latest files in the folder.
Transform files
Because we only need the content column, right click the Content
column and select Remove other columns.
Click the Combine Files button in the column header to combine
the content of all files. When all files have the same column format, this step
will be no problem and your queries pane will look something like this.