The Data Deduplication feature comes with built-in jobs that will automatically launch and optimize the specified volume(s) on a regular basis. Optimization jobs deduplicate data and compress file chunks on a volume per the policy settings. After the initial optimization is complete, optimization jobs run on the files that are included in the policies, according to the job schedules that you have configured or the default job schedules that ship with the product.
You can trigger an optimization job on demand in Windows PowerShell by using the Start-DedupJob cmdlet. For example:
PS C:> Start-DedupJob –Volume E: –Type Optimization
This command returns immediately and the job is launched asynchronously. If you want the job to complete at a later time , add the –wait parameter, like this:
PS C:> Start-DedupJob E: –Type Optimization -Wait
You can query the progress of the job on the volume by using the Get-DedupJob cmdlet:
The Get-DedupJob command show current jobs that are running or are queued to run.
You can query the key status statistics including the achieved savings on the volume by using the Get-DedupStatus cmdlet:
PS C:> Get-DedupStatus | f1
The Get-DedupStatus command shows the free space, space saved, optimized files, InPolicyfiles (the number of files that fall within the volume deduplication policy, based on the defined file age, size, type, and location criteria), and the associated drive identifier.
Note |
You can also view the deduplication savings in Server Manager on the Volumes page. From Server Manager, click File Services, and then click Volumes. Right-click the column heading to add Deduplication Savings. |
Optimization job queuing
Optimization jobs are started in the following order:
- Preemptive (manually run jobs that are not scheduled)Any manual jobs that include the –Preempt option will terminate any jobs that are currently running, and start immediately. (Note that the –Preempt option is ignored in scheduled jobs.)
- StopWhenSystemBusy parameterJobs that contain this parameter will stop if resources are not available to run the job without interfering with the server’s workload.
- PriorityAmong jobs that do not have the same StopWhenSystemBusy setting, high priority jobs are queued first, normal jobs are queued second, and low priority job are queued last.
- Manual or scheduledManual jobs are queued before scheduled jobs.
Memory settings are not considered as part of the optimization job queue algorithm.
Optimization metadata
Metadata provides you with evidence about savings that you gleaned from using optimization. There are three cmdlets that output this metadata: Update-DedupStatus, Get-DedupMetadata, and Measure-DedupFileMetadata. This metadata can help you assess the impact of some optimization configuration options.
Update-DedupStatus returns the following metadata:
Metadata |
What it indicates |
DedupSavedSpace |
Difference between the logical size of the optimized files and the logical size of the store (the deduplicated user data plus deduplication metadata). This number changes continually. |
DedupRate |
Ratio of DedupSavedSpace to the logical size of all of the files on the volume, and it is expressed as a percentage. This number changes continually. |
OptimizedFilesCount |
Number of optimized files on the specified volume. Note that this number will remain steady (instead of decrease) as users delete files from or add files to the volume, until you run a Garbage Collection job. This count is most accurate after a full garbage collection job runs. |
OptimizedFilesSize |
Aggregate size of all optimized files on the specified volume. Note that this number remains steady (instead of decreasing) as users delete files from or add new files to the volume, until you run a garbage collection job. This number is most accurate after a full garbage collection job runs. |
InPolicyFilesCount |
Number of files that currently qualify for optimization. This number stays relatively constant between optimization jobs. |
InPolicyFilesSize |
Aggregate size of all files that currently qualify for optimization. This number stays relatively constant between optimization jobs. |
LastOptimizationTime |
Date and time when an optimization job was last run on the specified volume. This date and time stays constant between optimization jobs. |
LastGarbageCollectionTime |
Date and time when a garbage collection job was run last on the specified volume. This date and time stays constant between optimization jobs. |
LastScrubbingTime |
Date and time when a scrubbing job was run last on the specified volume. This date and time stays constant between optimization jobs. |
Get-DedupMetadata returns the following metadata:
Metadata |
What it indicates |
DataChunkCount |
Number of data chunks on the volume. |
DataContainerCount |
Number of containers in the data store. |
DataChunkAverageSize |
Data store size (not including chunk metadata) divided by the total number of data chunks in the data store. |
StreamMapCount |
Number of data streams on the volume. |
StreamMapContainerCount |
Number of containers in the stream map store. |
StreamMapAverageChunkCount |
Stream map store size divided by the total number of streams in the store. |
HotspotCount |
Number of “hotspot” chunks on the volume. A hotspot is a chunk that is referenced over 100 times. All hotspot chunks are duplicated on the volume to provide automatic data corruption recovery in the event that corruption occurs on the disk and impacts one of these popular chunks. |
HotspotContainerCount |
Number of hotspot containers. |
CorruptionLogEntryCount |
Number of corrupted items on the volume. |