Child Abuse ImageryPartnersTechnology For Good

Introduction to Hashing: A Powerful Tool to Detect Child Sex Abuse Imagery Online

By April 12, 2016 No Comments

Last month, Thorn Digital Defender Del Harvey wrote about Twitter’s use of PhotoDNA, a technology developed by Microsoft that computes hash values of child sexual abuse material (CSAM). The tool applies a unique fingerprint to identify an individual photo to detect suspected material online and then supports law enforcement to report and investigate it.

The basics of hashing technology

This month, we want to highlight the benefits of hashing technology for industry, law enforcement, nonprofits, and service providers as they work to detect and remove child sexual abuse material online. But let’s start with what hashing is and why it is a useful technology for Thorn and our partners.

A hash function — also known as a hash value, hash code, hash sum, or just hash — is the process of taking a big volume of data and reducing it into a small volume of data by assigning a unique numerical identifier to a file, a group of files, or a portion of a file. The number ID of the hash is created based on an algorithm that is then applied to the characteristics of the data set.

The most commonly used algorithms, which are known as MD5 and SHA-1, generate number IDs that are so distinct that the chance that any two data sets are given the same hash value is less than one in one billion. One use of a hash is called a hash table, which is used in computer software to look up data quickly. Hash functions speed up the search process in a table or database by looking for duplicated records in a large file.

An important tool to combat child sexual abuse material online

Hashing can be used to identify known illegal photos, like child sexual abuse material — in the case of PhotoDNA, even if they have been altered. Prior to hashing technology, law enforcement agencies and technology companies were unable to distinguish between already-known images of child sexual abuse from those that were brand new, which slowed the process of detecting the illegal photos and identifying victims.

By creating unique identifying values, or “hashes,” of this imagery, the process of detecting and removing illegal imagery is streamlined.  Companies can now find and remove content quickly and detectives can focus on images that are new and are of children who have not yet been identified.

This is what Microsoft’s PhotoDNA tool is all about. As we detailed in a previous article, PhotoDNA helps agencies collaborate with one another to catch the bad guys and identify victims using hashing technology.

This month, we’ll discuss how many different organizations use hashing to fight child sexual exploitation. We’ll hear from John Shehan, Vice President of the Exploited Child Division at the National Center for Missing and Exploited Children (NCMEC), who will explain how hashing has become an essential tool for his organization. We’ll get up close and personal with the founders of Project VIC, a tool that utilizes hashing technology to identify victims of CSAM. Finally, Microsoft Senior Attorney Courtney Gregoire will share how PhotoDNA has helped the tech industry disrupt the distribution of over four million images of child abuse.  Their narratives demonstrate the different ways that innovative technology like hashing elevates our ability to protect children online.