De-duplication is a feature of many file system now including Storage Area Networks. Perhaps, in some respects, for larger networks the reliance on using storage within the SAN means that server based de-duplication will be less attractive but for smaller networks this is a great feature. In this tutorial will investigate the data de-duplication feature of the ZFS file system.
De-duplication can be set at the pool or data set level and allows for storage space to be optimized by the process of seeing if an identical block already exists within that pool before storing new data blocks. We are not checking by the complete file to be identical just blocks. In this way files that are similar but not the same may benefit from de-duplication. If you are storing ISO files for many different Oss, many of the blocks may be duplicated even though the complete ISO is not the same. In the same way, if you store virtual machine images good space spacing may be possible with desplication.
Checking to see if a pre-existing pool will benefit
If a pool already exists then you can run the pool diagnostic and look to see the duplication ratio.
sdb -S <poolname>
Enabling deduplication on a data set
zfs set -o dedup=on rpool/data