![]() The chunking process happens on the client, and then it passes the hash value to the backup server for the lookup process. Source dedupe happens on the backup client – at the source – hence the name source, or client-side dedupe. Not everyone has enough bandwidth to handle this level of replication.) Source deduplication A good dedupe system would reduce the size of a typical file by 99%, and the size of an incremental backup by 90%, making replication of all backups possible. ![]() Some advanced customers with larger budgets used the replication abilities of these target dedupe appliances to replicate their backups offsite. Most customers copied the backups to tape for offsite purpose. This incremental approach allowed many companies to switch from tape to disk as their primary backup target. ![]() This allows you to get the benefits of dedupe without changing your backup software. The chunking and comparison steps are all done on the target none of it is done on the source. The idea is that you buy a target dedupe disk appliance and send your backups to its network share or to virtual tape drives if the product is a virtual tape library (VTL). Target dedupe is the most common type of dedupe sold on the market today. This is why dedupe is such a space saver. If you create a 160-bit hash for an 8 MB chunk, you save almost 8 MB every time you back up that same chunk. If the hashes of two chunks match, they are considered identical, because even the smallest change causes the hash of a chunk to change. For example, if one enters “The quick brown fox jumps over the lazy dog” into a SHA-1 hash calculator, you get the following hash value: 2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12. The way the comparison works is that each chunk is run through a deterministic cryptographic hashing algorithm, such as SHA-1, SHA-2, or SHA-256, which creates what is called a hash. Where and how the chunks are divided is the subject of many patents, but suffice it to say that each product creates a series of chunks that will then be compared against all previous chunks seen by a given dedupe system. A chunk is one or more contiguous blocks of data. ![]() The usual way that dedupe works is that data to be deduped is chopped up into what most call chunks. ![]() It will not need to back up these additional copies of the same segments it will only identify their location. Then if you email it to a colleague, it should be able to identify the same blocks in your Sent Mail folder, their Inbox and even on their laptop’s hard drive if they save it locally. If you update it and back it up again, it should be able to identify the segments that have changed and only back them up. Deduplication can find redundant blocks of data between files from different directories, different data types, even different servers in different locations.įor example, a dedupe system might be able to identify the unique blocks in a spreadsheet and back them up. It is similar to compression, which only identifies redundant blocks in a single file. What is data deduplication?ĭedupe is the identification and elimination of duplicate blocks within a dataset. Understanding the different kinds of deduplication, also known as dedupe, is important for any person looking at backup technology. It is single-handedly responsible for enabling the shift from tape to disk for the bulk of backup data, and its popularity only increases with each passing day. Deduplication is arguably the biggest advancement in backup technology in the last two decades. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |