Safeguarding the Digital Realm: Encryption, Encoding, Hashing

Article contributed by: Jared Justin & Angela Thai Yen Wen

An overview of Encryptions vs Encoding vs hashing

	Encryption	Encoding	Hashing
Definition	Process to encode data securely. Only an authorised user is able to retrieve original data. Others will get fake data/garbage.	Process of transforming data. Can be a different type of system using any available algorithms.	Process that takes data of any size and applies a mathematical process to it that creates an output that’s a unique string of characters and numbers of the same length.
Purpose	Transforming data to keep a secret from people	Protecting the integrity of data	Helps to encrypt digital signatures and passwords
Usage	Used in scenarios such as secure communication channels, password storage, and protecting sensitive information. Used for various purposes, including data transmission and data storage and to maintain data confidentiality	Used in scenarios such as secure communication channels, password storage, and protecting sensitive information. Used for various purposes, including data transmission and data storage and to maintain data usability	Used in Database indexing, Password storage, data compression, search algorithms, Cryptography, load balancing, blockchain, image processing, File comparison, and fraud detection.
Reverse process	Can be retrieved using decryption	Can be retrieved using decoding	Hashing is irreversible
Requirement	Encryption key is required	Encryption key is not required	No key is available for decrypting unless there is a dictionary of hash value
Complexity	It involves complex algorithms that are designed to withstand cryptographic attacks	Typically involves simple algorithms for transforming data from one format to another	Hashing algorithm is not so complex to compute, but the hash result is complex to prevent overlapping
Security	More secure	Less secure. Easily decoded	Most secure as it almost impossible to be decoded
Example	AES, RSA, Blowfish	ASCII, UNICODE, URL encoding, Base64	MD5, SHA-1, BCRYPT, Whirlpool
Real-life example	Securely sending a password over the internet	Viewing special characters on the web page	Password is stored in a hashed value format in the database

Table 1

Hashing

Figure 1.1

Hashing takes input data and pairs it with a hashing algorithm to create a unique output of a specific length. The output is called either “hash value” or “digested hash” which represents the original data without making it known or available to access. The Hashing process is virtually irreversible, meaning it is a one-way cryptographic function.

The advantage of hashing is that since it’s a one-way cryptographic function, no one can figure out the original input data using the hash value. It is more secure and passwords are unlikely to be cracked because reverse engineering it is impossible.

As the output length of all hashing algorithms is the same regardless of the length of the input size, it is useful for space allocation for the digest in data structuring, file format, and network protocol field where the length required is known. It also helps to keep hackers from knowing the size of the original input because all outputs, regardless of the length of the original output, will always stay the same length. Even the smallest change in the input results in a big change in the hash value to ensure that no one can decipher the original text. This is known as the “avalanche effect” because it’s similar to the concept of how the tiniest change or shift in the snow that builds up on a mountainside can trigger an avalanche. When a user enters the password of his/her account, they expect to log in in microseconds. This can only happen if the hashing function performs at extremely high speed in creating hashes.

Figure 1.2

When ensuring data integrity in email and messaging apps, data integrity checks are usually implemented in emails and messaging apps in which the recipient can compare the hash value of the message with the hash value sent by the sender to ensure no modification is done while being transmitted.

To ensure the file’s integrity, code signing certificates are used to provide unique identity through digital signature for various files such as applets, macros, plug-ins, codes, and other executable files before publishing on the internet.

All hashing algorithms are:

Mathematical. Strict rules underlie the work an algorithm does and those rules can’t be broken or adjusted.
Uniform. Choose one type of hashing algorithm, and data of any character count put through the system will emerge at a length predetermined by the program.
Consistent. The algorithm does just one thing (compress data) and nothing else.
One way. Once transformed by the algorithm, it is nearly impossible to revert the data to its original state.

Types of Common hashing algorithms include:

MD5. This is one of the first algorithms that has gained widespread acceptance. It was designed in 1991 and was thought to be extremely secure at the time. However, hackers have discovered how to decipher the algorithm since then, and they can do so in seconds. Due to its vulnerability, most experts believe it is not secure for general usage.
Example of hashing: Chicago → 9cfa1e69f507d007a516eb3e9f5074e2

RIPEMD-160. In the mid-1990s, Belgium developed the RACE Integrity Primitives Evaluation Message Digest (or RIPEMD-160). It’s considered extremely secure because hackers haven’t figured out how to crack it yet.

SHA. The SHA family of algorithms is thought to be slightly more secure. The first versions were created by the US government, but other programmers have expanded on the original frameworks, making later variations more stringent and difficult to break. In general, the greater the number following the letters “SHA”, the more recent the release and the more complex the programme. For example, SHA-3 includes randomness in the code, making it much more difficult to crack than previous versions. Because of this, it became a standard hashing algorithm in 2015. Example of hashing: Chicago → 0f5d983d203189bbffc5f686d01f6680bc6a83718a515fe42639347efc92478e (SHA-256)

Whirlpool. In 2000, designers created the algorithm based on the Advanced Encryption Standard. It’s also considered very secure.

Bcrypt. It is specifically designed to be slow and computationally intensive, making it resistant to brute-force and rainbow table attacks. It takes a longer time to generate a hash value.
For example: helloworld → $2a$10$lDwqqLvGlrdBoGAegsRdxuuSEGAKxWu/Zn0tp5nw3vvFkUzXzjy3W

Encryption

Figure 1.3

Encryption is the process of converting data from a human-readable format into a non-human readable format. The data only can be retrieved with a “key”. So, what is the key? There are two major methods of encryption, which are Symmetric Encryption and Asymmetric Encryption. Symmetric Encryption is the encryption method that uses 2 same keys to encrypt and decrypt the data, namely the “Public Key”. This key is easily accessed by the public as the key can be shared. This results in the data being easily encrypted and stolen by people for malicious purposes, because it doesn’t require many steps to encrypt and decrypt. The second method is Asymmetric Encryption which uses 2 different keys to encrypt and decrypt data. The first key is the “Public key”, and the second key is the “Private key”. Only those who have the “Private Key” can access the encrypted data. This enhances security and it is widely recommended to use Asymmetric Encryption to ensure data security and confidentiality. Data security encryption is widely used in large corporations to protect their data which will be sent between a browser and server.

The advantage of using encryption is the ability to maintain data privacy. By encrypting data, it is protected from unauthorised access, ensuring that only authorised individuals can view or manipulate sensitive information. This can in turn present data breaches and the significant financial and legal consequences that come with it. While threats to data integrity might be less obvious than highly publicised data breaches, they can still cause significant damage which is true for some data-driven industries and organisations. For example, AI/ML models, which are based on valuable datasets that are highly susceptible to tampering, manipulation, and data poisoning. These datasets, which may cost upwards of a million dollars to create, must remain unaltered to be effective. By making it more difficult for cybercriminals to tamper with data, encryption tools help strengthen data integrity and ensure that critical datasets remain accurate and unaltered. Strict data protection regulations are becoming more common across industries and geographic jurisdictions. As part of a robust data protection strategy, encryption can help an organisation comply with the regulations and to avoid hefty fines for non-compliance.

Figure 1.4

Types of encryptions:

Triple DES	Uses symmetric encryption. Advanced version of DES block cipher. Encrypts data using 168-bit key. Works in three phases when encrypting data: 1. Encrypt 2. Decrypt 3. Re-encrypt Decryption phase would be: 1. Decrypt 2. Encrypt 3. Decrypt againMuch slower compared to other encryption Encrypts data in shorter block lengths. Higher risk of data theft.
AES	Advanced Encryption Standard (AES) Symmetric encryption based on Rijndael algorithm. Uses block cipher, encrypts one fixed-size block at a time. Works in 128/-192-bit (can be extended up to 256-bit key length)Each bit has different rounds: 1. 128-bit (10 rounds) 2. 192-bit (12 rounds)Considered as best encryption algorithm (developed by US National Institute of Standards & Technology)One of secured types of encryption
RSA	Rivest-Shamir-Adleman (RSA)Asymmetric cipher. Functions on two keys: 1. Public key (encryption). 2. Private key (decryption)Considered as best encryption algorithmFunctions on 1024-bit, extend up to 2048-bit key lengthWhen key size getting larger, encryption. process getting slower. One of the strongest encryption typesConsidered standard for data sharingHard to hack due to length of keys it works with.
Blowfish	Designed to replace DES. Symmetric block cipherWorks on variable key length from 32 to 448 bits. Divides data/message into fixed 64-bit when encrypting/decrypting. Designed for fast function.
Twofish	Symmetric block cipher. Advanced version of Blowfish encryption. Has block size of 128-bits, able to extend to 256-bit key length. Breaks data into fixed-length blocksFunctions in 16 rounds (fixed). Flexible to work withAllows choosing encryption process. Licence-free, fast, full control.
Elliptic Curve Cryptography (ECC)	ECC uses a complex mathematical model to encrypt data called the Elliptic-curve Diffie-Hellman. The process of encrypting is easy, but the process of undoing the algorithm is difficult. An ECC algorithm is comparable to a 15360-bit RSA key, which is more powerful.

Table 2

Encoding

Figure 1.5

Encoding is the process of converting data into a specialised format that is needed for a variety of information processing needs, such as programme compilation, data transmission, and storage. Let’s look at a simpler example. Think of a scenario where the size of an audio or video file is being reduced or changed format, that is one of the examples of encoding. The main goal of encoding is to make data safely and effectively consumable by different users using various systems. The goal is to make the data readable and accessible to all potential end users. Unlike hashing and encryption, encoding does ensure data security, but it is not considered as an n security measure in terms of cybersecurity and preventing attacks. It acts more like a communication medium between humans and machines.

Examining the key advantages of data encoding will resonate with why it forms a cornerstone of computer science. Data encoding transcends basic binary conversions to cover integral aspects like data transfer, file storage, and system-level communications. By encoding data into a format that both the sender and the receiver understand, the possibility of data misinterpretation is significant in English. Encoding or re-encoding data in a universally accepted format ensures uniformity, making data more portable and easily interpretable by different systems. Encoded data is typically compressed, thereby reducing repetitions and making data transfer more bandwidth and time-efficient. Encoding also helps in maintaining data integrity by foreseeing and handling transmission errors by incorporating redundancy and error detection and correction mechanisms. Beyond data communication, encoding plays an equally crucial role in data and file storage. It is able to facilitate compressions. Since encoding techniques facilitate data compressions where it greatly reduces the amount of storage required. Encoding can also enhance retrieval. It can improve the retrieval of information right when required is made possible with specific encoding methodologies.

Figure 1.6

Types of Encodings:

HTML Encoding	Used to display HTML page in proper format. Able to know which character set to be used. Various characters used in HTML (e.g. <,>)
URL Encoding	URL (Uniform Resource Locator). Used to convert character to a format that can be transmitted to internet. Known as percent-encoding. Performed to send URL to internet using ASCII character-set. Non-ASCII characters replaced with % followed by hexadecimal digits
Unicode Encoding	Standard for universal character set. Allows encoding, represent, handling text represented in most languages/ writing systems available worldwide. Provides code point/number for each character in all supported languages. Represents all characters possible in all languages. Standard use 8,16, 32 bits represent characters. Defines Unicode Transformation Format (UTF) to encode code points. Has the following UTF schemes: 1. UTF-8 Encoding – Defined by UNICODE standard – used in electronics communication Capable encoding 1,112,064 valid character code points (using 1 – 4 one-byte(8-bit)) code units 2. UTF-16 Encoding – Represents character code points using 16-bits integers 3. UTF-32 Encoding – Represents code point as 32-bit integers
Base64 Encoding	Used to encode binary data into ASCII characters. Used in mail systems (e.g SMTP) because only accept ASCII textual data. Used in simple HTTP authentication. Used to transfer binary data into cookies, other parameters. Will get corrupted if mail system unable to deal with binary data. Represents data in 24 bits. Divided into four groups of 6 bits. Each group/chunks converted to equivalent Base64 value
ASCII Encoding	American Standard Code for Information Interchange (ASCII). First character encoding standard (release in 1963). Used to represent English characters as numbers. Single byte encoding, using the bottom 7 bits. Each file, alphabetic, numeric, special character represented with a 7-bit binary number.

Table 3

The importance of encoding & encryptions & hashing

Mindmap 1.1

References:
1. GeeksforGeeks. (2023, February 1). Applications of hashing. https://www.geeksforgeeks.org/applications-of-hashing/
2. Admin. (2021, October 19). What is a hashing algorithm? A look at hash functions. Code Signing Store. https://codesigningstore.com/what-is-hashing-algorithm-how-it-works
3. What is Data Encryption? (2023b, June 9). www.kaspersky.com. https://www.kaspersky.com/resource-center/definitions/encryption
4. The advantages of Data Encryptionhttps://shardsecure.com/blog/advantages-data-encryption
5. Techslang. (2022, September 2). What is Encoding? Techslang — Tech Explained in Simple Terms. https://www.techslang.com/definition/what-is-encoding/
6. GeeksforGeeks. (2022, April 28). Importance of hashing. https://www.geeksforgeeks.org/importance-of-hashing/

Leave a ReplyCancel Reply