16 Nov 2018

Data Lifecycle Management with Azure Storage

Introduction

Azure Storage provides the foundation for a lot of the services available on the Azure Cloud platform. In this post I’m going to walk you through the data lifecycle management features of Azure Storage Accounts.

Components of a Data Lifecycle Management

In a nutshell, DLM refers to a policy-driven approach that can be automated to take data through its useful life. But what is exactly the definition of data’s life? Imagine that data is captured and stored on an Azure  storage account. The new data will either be accessed frequently for further processing, reporting, analytics or some other use. Or, it will sit there for a long time and eventually become obsolete. The data may have logic and validations applied to it throughout either process. But at some point, it will come to the end of its useful life and be archived, purged, or both. This is where automatic data lifecycle management in Azure can help customers to optimize the size and costs of their storage accounts in the cloud. Azure Blob Storage lifecycle management (currently in public Preview) offers a rich, rule-based policy which you can use on GPv2 and Blob storage accounts to transition your data to their appropriate access tiers or expire at the end of its lifecycle. Lifecycle management policy helps you:

  • Transition blobs to a cooler storage tier (Hot to Cool, Hot to Archive, or Cool to Archive) to optimize for performance and cost
  • Delete blobs at the end of their lifecycles
  • Define rules to be executed once per day at the storage account level
  • Apply rules to containers or a subset of blobs (using prefixes as filters)

Azure Storage Tiers

But first let’s have a word about the different tiers within Azure Storage / an Azure Storage Account. To support tiering, you need a Blob Storare General Purpose V2 (GPv2) Storage Account. However, if you have a GPv1 account, you can convert it easily to GPv2 by using the following methods:

Azure CLI

 az storage account update --access-tier Cool -n miruaccfoeus2 -g rg-stafailovertests-us

PowerShell

 Set-AzureRmStorageAccount -ResourceGroupName <resource-group> -AccountName <storage-account> -UpgradeToStorageV2

Here’s a short overview of the different tiers.

</tr> </tr> </tr> </tbody> </table> </div>

Manual Tiering (changing access tier)

Changing the access tier can be performed on account or blob level, depending on the type of access tier. To change the access tier of a storage account from hot to cold:

Azure CLI

az storage account update --access-tier Cool -n miruaccfoeus2 -g rg-stafailovertests-us

PowerShell

Set-AzureRmStorageAccount -Name miruaccfoeus2 -ResourceGroupName rg-stafailovertests-us -AccessTier Cold -Force

 

To change the access tier of a blobs older than 7 days (example)

PowerShell

#Set some constants
$stAccountName = "miruaccfoeus2"
$stContainerName = "demofiles"
$dtDaysRetention = 7
$stDesiredAccessTier = "Cool"


#Get storage account
$stAccountObj = Get-AzureRmStorageAccount | Where-Object {($_.StorageAccountName -eq $stAccountName) -and ($_.Kind -eq 'StorageV2' -or $_.Kind -eq 'BlobStorage')}
$stAccountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $stAccountObj.ResourceGroupName -Name $stAccountObj.StorageAccountName).Value[0]
$stAccountContext = New-AzureStorageContext -StorageAccountName $stAccountObj.StorageAccountName -StorageAccountKey $stAccountKey
$StContainers = Get-AzureStorageContainer -Context $stAccountContext

# Get all blobs of a specific container
$arrBlobs = @()
foreach ($iCont in $StorageContainers)
{
$arrBlobs += Get-AzureStorageBlob -Context $StAccountContext -Container $stContainerName
}

# Date Logic
$dtUTC = (Get-Date).ToUniversalTime()
$drRet = $dtUTC.AddDays(-$dtDaysRetention)

#Get matching Blobs
$arrBlobsMatch = $arrBlobs | Where-Object {$_.lastmodified.DateTime -le $dtDaysRetention}

# Change the Tier on the blobs
Foreach($iBlob in $arrBlobsMatch)
{
$iBlob.icloudblob.SetStandardBlobTier($stDesiredAccessTier)
}

Automatic, Policy based tiering

Now that we have seen how to manually change the access tier, it would be really cool to have an automatic tiering of data without the need of a scheduled script huh? Here is where the new feature “Azure Blob Storage Lifecycle Management”
(by the time of writing this post the feature is in public preview, so do not use it in production).

Azure Blob Storage lifecycle management (Preview) offers a rich, rule-based policy which you can use on GPv2 and Blob storage accounts to transition your data to their appropriate access tiers or expire at the end of its lifecycle.

Lifecycle management policy helps to:
Transition blobs to a cooler storage tier (Hot to Cool, Hot to Archive, or Cool to Archive) to optimize for performance and cost
Delete blobs at the end of their lifecycles
Define rules to be executed once per day at the storage account level
Apply rules to containers or a subset of blobs (using prefixes as filters)
To try this out in the preview phase, you first have to register the appropriate resource provider feature

Azure CLI

az feature register --namespace Microsoft.Storage --name DLM

PowerShell

Register-AzureRmProviderFeature -FeatureName DLM -ProviderNamespace Microsoft.Storage

As this will take a little while, you can check the status of the process as well.

Azure CLI

az feature show --namespace Microsoft.Storage --name DLM

PowerShell

Get-AzureRmProviderFeature -FeatureName DLM -ProviderNamespace Microsoft.Storage

Once the feature has been enabled we can create DLM policies.

But first we have to install a preview version of the AzureRM.Storage PowerShell module. After we’ve done this, there are new commandlets available.

So then, first we check if we have any available storage management policies assigned to the storage account.

Get-AzureRmStorageAccountManagementPolicy -ResourceGroupName rg-stafailovertests-us -StorageAccountName miruaccfoeus2

If this is not the case we get the following error (IMHO, a warning message would be sufficient :-))

Get-AzureRmStorageAccountManagementPolicy : No ManagementPolicy found for account miruaccfoeus2
At line:1 char:1
+ Get-AzureRmStorageAccountManagementPolicy -ResourceGroupName rg-stafa …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [Get-AzureRmStor…anagementPolicy], CloudException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Management.Storage.GetAzureStorageAccountManagementPolicyCommand

Ok, let’s create and assign a new policy.

#Define the Policy and the Rules
$rules = '{
  "version": "0.5",
  "rules": [ 
    {
      "name": "MIRUDemoDLM", 
      "type": "Lifecycle", 
      "definition": {
        "filters": {
          "blobTypes": [ "blockBlob" ],
          "prefixMatch": [ "demofiles/DLMFiles/docs" ]
        },
        "actions": {
          "baseBlob": {
            "tierToCool": { "daysAfterModificationGreaterThan": 14 },
            "tierToArchive": { "daysAfterModificationGreaterThan": 120 },
            "delete": { "daysAfterModificationGreaterThan": 1825 }
          },
          "snapshot": {
            "delete": { "daysAfterCreationGreaterThan": 90 }
          }
        }
      }
    }
  ]
}'

Assign the Policy

Set-AzureRmStorageAccountManagementPolicy -ResourceGroupName rg-stafailovertests-us -StorageAccountName miruaccfoeus2 -Policy $rules

Now the output should look like this.

With this we have now the following rules on the path demofiles/DLMFiles/docs assigned

  • Files, modified more than 14 days ago are moved to cool tier
  • Files, modified more then 120 days ago are archived to the archive tier
  • Files, modified more than 5 years are being deleted
  • Snaphots older than 90 days are removed as well

For a list of other possible rules see: https://docs.microsoft.com/en-us/azure/storage/common/storage-lifecycle-managment-concepts

Note: Lifecycle policy is executed once a day by the platform. Once a new policy is set, it can take up to 24 hours for actions such as tiering or deletion to be initiated and executed.

 

That’s it so far. In a future post I’ll cover legal / WORM capabilities of Azure Storage, so stay tuned!

 

 

Tier Tier Level Intended Use Use cases Access times
Premium Storage (Preview) Storage Account High frequently access data require low access latencies -online transactions</p>

-video rendering

-static web content</td>

Micro- to Milliseconds
Hot Storage Storage Account | Blob Frequently accessed data (default) -standard data processing Milliseconds
Cold Storage Storage Account | Blob Infrequently accessed data, stored for at least 30 days -media archive</p>

-short term backups

-DR

-raw telemetry data</td>

Milliseconds
Archive Storage Blob Rarely or never accessed data stored for at least 180 days -archival sets</p>

-long term backup

-compliance archive sets</td>

<15 hours