Clean up your Team Project Collection prior to migrating to VSTS

Clean up your Team Project Collection prior to migrating to VSTS

To prepare your TFS Project Collection for migration, you may want to remove (stale) old data to reduce the database size first.

Most actions are already documented here. Queries that can aid in detecting where your space is allocated are also found in this recent support ticket.

Delete old workspaces

Deleting workspaces and shelvesets can reduce your migration and upgrade times considerably. either use the tf commandline or leverage a tool like the TFS SideKicks to identify and delete these.

You can list & clean up workspaces using the following commands:

tf vc workspaces /owner:* /computer:* /format:detailed

Then delete the workspaces from your server with:

tf vc workspace /delete 

The same applies to shelvesets:

List all shelvesets:

tf vc shelvesets /format:detailed

Then delete a shelveset:

tf vc shelve /delete

If you want to script these commands using PowerShell, you can also use /format:xml and parse the results.

Build results

Not just build results, but often overlooked the actual build records can take up a considerable amount of data. Use tfsbuild destroy (XAML) to permanently delete the build records. In the past, I've encountered clients who had 1.8 million "hidden" builds in their database and removing them shaved off quite a considerable amount of data. These records were kept around for the warehouse.

In case you run into very slow execution of tfsbuild, you may need this patched version it has a few fixes that prevent it rom trying to download all the build logs for every build in your system just to get to the build ids.

Old team projects

Of course, destroying old team projects can give back a lot of data. Anything you don't need to send to azure helps. You could also consider splitting the collection and to leave behind the old projects. That will give you the option to detach that collection and store it somewhere, should you ever need that data again.

Redundant files

Deleted branches are a very common hidden size hog. When deleting things in TFVC, they are not actually deleted, they're just hidden. Finding deleted files and especially old development or feature branches can give you back a lot of data. Use tf destroy to get rid of them.

You may also want to look for checked in nuget package folders, those can quickly rack up a lot of space as well.

Code Lens index

Team Foundation Server 2013 introduced server side indexing of TFVC controlled files to allow Visual Studio access to data on who changed which files when directly in the UI. This server-side index can grow quite quickly depending on the size of your code base and churn.

You can control the index through the tfsconfig codeindex command. You can specify up to how long ago you want to index /indexHistoryPeriod:#months, delete the index altogether /destroyCodeIndex or exclude specific problematic files /ignoreList:add $/path.

Code Lens is also referenced to as Code Sense and Code Index internally in the product.

More can be found here.

In case deleting the index times out, you read this post on StackOverflow with additional guidance. Be careful, manually running SQL on the TFS collection DB's isn't supported.

Test Attachments

Ohh yes, especially when you use test attachments, these can grow like crazy, depending on your TFS version either use the built-in test attachment cleanup features or use the Test Attachment Cleaner from the TFS power tools.

XAML Builds

The build definitions themselves won't take a lot of database space, but the build results may. But those have been covered in a previous section.

In the past, I've had to patch tfbuid.exe to handle (very) large amounts of build records, as it tends to try and fetch all build data locally before proceeding with the delete action. You may need to rely on the TFS Client Object Model to achieve a similar result.

Git Repositories

You may have data in your git repositories that are no longer accessible due to force pushes or deleted branches. It's also possible that certain data in Git could be packed more efficiently. To clean your repositories you have to clone them locally, clean them up, delete the remote repo from TFS and push the cleaned copy to a new repository (you can use the same name as the old one). Doing this will break references with existing build definitions and you will have to fix these up. While you're at it, you could also run the BFG repo Cleaner and convert the repositories to enable Git-LFS support to handle large binary files in your repositories more elegantly.

git clone --mirror https://tfs/project/repo
# optionally run BFG repo cleaner at this point
git reflog expire --expire=now --all 
git gc --prune=now --aggressive
git repack -adf
# Delete and recreate the remote repository with the same name
git push origin --all
git push origin --tags

Work item (attachments)

Work items can gather up a considerable amount of data, especially when people start attaching large attachments to them. You can use witadmin destroywi to delete work items with unreasonably large attachments. To retain the work item, but delete its attachments you can delete the attachments from the current work item and then clone it. After cloning, destroy the old work item to allow the attachments to be cleaned up.

Old work items that you no longer need (say the sprint items from 6 years ago) can also be deleted. My colleague René has a nice tool that allows you to bulk-destroy by first creating the appropriate work item query.

Package feeds

The Nuget, npm and especially Universal Packages can take up quite a bit of space. In this seperate post I've shared a SQL stateent to figure out which feeds and packages take up the most space.

Be sure to run the cleanup jobs

TFS often doesn't immediately prune data from the database, in many cases, it just marks stuff as deleted for later processing. To force the cleanup to happen immediately, run the following stored procedures on your Project Collection database:

EXEC prc_CleanupDeletedFileContent 1
# You may have to run the following command multiple times, the last
# parameter is the batch size, if there are more items to prune than the 
# passed in number, you will have to run it multiple times
EXEC prc_DeleteUnusedFiles 1, 0, 100000

That 100000 at the end is the number of marked items to process. If you've deleted a lot of content you may need to run that last procedure a couple of times before everything will be deleted.

Other useful queries

To identify how much data is stored in each section, there are a few useful queries you can run. The actual query depends on your TFS version, but since you're preparing for migration I suspect you're on TFS 2017 or 2018 at the moment.

Find the largest tables:

SELECT TOP 10 
    o.name, 
    SUM(reserved_page_count) * 8.0 / 1024 SizeInMB,
    SUM(
        CASE 
              WHEN p.index_id <= 1 THEN p.row_count
              ELSE 0
        END) Row_Count
FROM sys.dm_db_partition_stats p
JOIN sys.objects o
    ON p.object_id = o.object_id
GROUP BY o.name
ORDER BY SUM(reserved_page_count) DESC

Find the largest content contributors:

SELECT Owner = 
    CASE
        WHEN OwnerId = 0 THEN 'Generic' 
        WHEN OwnerId = 1 THEN 'VersionControl'
        WHEN OwnerId = 2 THEN 'WorkItemTracking'
        WHEN OwnerId = 3 THEN 'TeamBuild'
        WHEN OwnerId = 4 THEN 'TeamTest'
        WHEN OwnerId = 5 THEN 'Servicing'
        WHEN OwnerId = 6 THEN 'UnitTest'
        WHEN OwnerId = 7 THEN 'WebAccess'
        WHEN OwnerId = 8 THEN 'ProcessTemplate'
        WHEN OwnerId = 9 THEN 'StrongBox'
        WHEN OwnerId = 10 THEN 'FileContainer'
        WHEN OwnerId = 11 THEN 'CodeSense'
        WHEN OwnerId = 12 THEN 'Profile'
        WHEN OwnerId = 13 THEN 'Aad'
        WHEN OwnerId = 14 THEN 'Gallery'
        WHEN OwnerId = 15 THEN 'BlobStore'
        WHEN OwnerId = 255 THEN 'PendingDeletion'
    END,
    SUM(CompressedLength) / 1024.0 / 1024.0 AS BlobSizeInMB
FROM tbl_FileReference AS r
JOIN tbl_FileMetadata AS m
    ON r.ResourceId = m.ResourceId
    AND r.PartitionId = m.PartitionId
WHERE r.PartitionId = 1
GROUP BY OwnerId
ORDER BY 2 DESC

If file containers are the issue:

DECLARE @partitionId INT = 1

SELECT  CASE
            WHEN ArtifactUri LIKE 'vstfs:///%'
            THEN SUBSTRING(ArtifactUri, 10, CHARINDEX('/', ArtifactUri, 10) - 10)
            ELSE ArtifactUri
        END,
        SUM(cast(ci.FileLength as decimal(38)))/1024.0/1024.0 AS SizeInMb,
        COUNT(*) AS Records,
        d.DataspaceIdentifier        
FROM    tbl_Container c
JOIN    (
            SELECT  ci.FileId,
                    ci.DataspaceId,
                    MAX(ci.FileLength) AS FileLength,
                    MAX(ci.ContainerId) AS ContainerId
            FROM    tbl_ContainerItem ci
            WHERE   ci.PartitionId = @partitionId
            GROUP BY ci.FileId, ci.DataspaceId
        ) AS ci
ON      ci.ContainerId = c.ContainerId
JOIN    tbl_Dataspace d
ON      d.DataspaceId = ci.DataspaceId
WHERE   c.PartitionId = @partitionId
        AND d.PartitionId = @partitionId
GROUP BY CASE
            WHEN ArtifactUri LIKE 'vstfs:///%'
            THEN SUBSTRING(ArtifactUri, 10, CHARINDEX('/', ArtifactUri, 10) - 10)
            ELSE ArtifactUri
        END,
        d.DataspaceIdentifier
ORDER BY SizeInMb DESC

Are there other ways to clean and prepare your projects prior to upgrade or migration that I may have missed, leave a comment below!

Original