Skip to main content

Zero to Hero: Crafting Rules for Cyber Resilience!

 Zero to hero YARA rules



In this follow-up to a previous blog I wrote on exploration of threat hunting with Veeam & YARA , in this blog I want to go into detail on how to create, maintain & test YARA rules.

Checkout my previous post here: Threat Hunting with Veeam : Leveraging Yara for Incident Response (mritsurgeon.co.za)

Introduction to YARA:

Understanding Yara: Yara is a versatile and indispensable tool in the field of malware analysis. It is a staple in most cybersecurity professionals' toolboxes. YARA rules are customizable patterns used for identifying specific malware, targeted attacks, and security threats tailored to your unique environment.

Antivirus vs YARA :

The YARA scanner and rules function similarly to an antivirus scanner and its signatures, but with a key distinction. YARA is a tool designed for crafting rules to detect malware, whereas antivirus relies on predefined rulesets for identifying malicious software.

In the context of a zero-day virus or malware, which is entirely unknown before discovery, traditional antivirus signatures may not exist yet. This is particularly crucial when dealing with polymorphic viruses featuring encrypted payloads and mutation engines. The encryption conceals the harmful payload from standard scanners and threat detection software, which depend on recognizing the virus through its decryption process. Once the virus infiltrates a target system, its payload is decrypted, triggering the infection. The mutation engine further complicates detection by generating new decryption routines randomly, making it harder for the virus to be identified as it spreads to new targets.

Now, how can we defend against malware or viruses that manage to deceive even the most robust antivirus products?

YARA provides part of the solution. Much like an antivirus relies on defined signatures for recognized malware, in the case of a zero-day threat where traditional antivirus definitions fall short, we can create our own rules using YARA. This allows us to proactively establish detection rules for previously unknown malware, filling the gap left by traditional antivirus solutions.

Lets start with Deconstructing a YARA Rule:

·        Rule Name

This is a user-defined name that provides a clear and concise identifier for the rule. It helps distinguish one rule from another. For example:

---------------------------------------------------------------

rule XYZMalwareRule {

---------------------------------------------------------------

Metadata

Metadata in YARA rules contains additional information about the rule. It typically includes details like the author, description, and any other relevant information. It provides context for the rule. For example:

---------------------------------------------------------------

 meta:
  author = "Your Name"
  description = "Detects a specific malware variant XYZ"

---------------------------------------------------------------

Strings

Strings in YARA rules are the patterns or sequences of characters that the rule searches for in the target files. These can be simple text strings or more complex patterns using wildcards or regular expressions. You can leverage tools Like PE studio & PE viewer  & HDX a Hex editor , these are valuable for analyzing Portable Executable (PE) files, such as Windows executables (.exe) and dynamic link libraries (.dll). These tools help security researchers, analysts, and malware experts to inspect and understand the internal structure of PE files.

 Once you identify relevant strings using these tools, you can use that information to create YARA rules for detecting similar patterns in other files. For example, if you find a unique string associated with a particular malware variant, you can incorporate that string into a YARA rule to detect instances of the malware across different files or Backups.

o   Text Strings:

 These are straightforward character sequences enclosed in double quotes. For example:

---------------------------------------------------------------

 rule ExampleRule {

   strings:

          $text_string = "malware123"

   condition:

          $text_string

 }

---------------------------------------------------------------

PE Studio of a Text String 

Here we can see on analyzing a EXE using PE studio , we find a Unicode Text String with value 

"Hit any key to exit..."


So a rule here to find the Text String would look like:

---------------------------------------------------------------

rule ExitStringRule {

  strings:
    $exit_string = "Hit any key to exit..." wide

  condition:
    $exit_string
}
---------------------------------------------------------------

The wide modifier indicates that the string is Unicode.

Here is a Match on the Unicode Text Rule for the EXE we examined.


o   Hex Strings

Hex strings allow you to specify byte sequences using hexadecimal notation. This is useful for identifying binary patterns. For example:

---------------------------------------------------------------

rule ExampleRule {

  strings:
       $hex_string = { 4D 5A 90 00 }

  condition:
       $hex_string      
        
             }
---------------------------------------------------------------

HDX HEX Editor to find the same Text Unicode string as a Hex String :


Here I’ve identified the HEX of previous example value "Hit any key to exit..."

Let’s Create a YARA Rule specific to this HEX

---------------------------------------------------------------

rule HexUnicodeStringRule {

  strings:

    $hex_unicode_string = { 48 00 69 00 74 00 20 00 61 00 6E 00 79 00 20 00 6B 00 65 00 79 00 20 00 74 00 6F 00 20 00 65 00 78 00 69 00 74 00 2E 00 }

  condition:

    $hex_unicode_string

}

---------------------------------------------------------------

Here is a Match on the Unicode Text Rule for the EXE we examined.


·        Condition

The condition is the logical expression that must be true for the rule to trigger. It combines the elements defined in the rule, such as metadata and strings, to determine if the rule matches a given file. You can use many different operators like and, or, and not

Lets put this all together , For Rule Example  :

---------------------------------------------------------------

rule HexUnicode_TextStringRule {

meta:
  author = "Ian Engelbrecht"
  description = "Detects a specific a Unicode text String & a Hex Value for that string"

  strings:
    $hex_unicode_string = { 48 00 69 00 74 00 20 00 61 00 6E 00 79 00 20 00 6B 00 65 00 79 00 20 00 74 00 6F 00 20 00 65 00 78 00 69 00 74 00 2E 00 }
    $exit_string = "Hit any key to exit..." wide
 
  condition:
    $hex_unicode_string or $exit_string
}

---------------------------------------------------------------

So lets explain the Final Rule :


meta section:

Provides metadata about the rule, including the author and a brief description.

strings section:

Defines two strings to be searched in the analyzed files:

$hex_unicode_string: A hexadecimal sequence representing a Unicode string.

$exit_string: A wide ASCII string "Hit any key to exit..."

condition section:

Specifies the condition for the rule to trigger:

$hex_unicode_string or $exit_string: The rule triggers if either the hexadecimal Unicode string or the ASCII "Hit any key to exit..." string is found in the analyzed file.

Here is a match on the combined rule , Matching both strings


What About Data Classification with YARA:

YARA isn't just about hunting threats; it's a versatile tool for data classification you can pinpoint data categorization, ensuring your information remains secure.

I’m going to Create 2 rules based on the information we just went through above how Yara rule structure:

First I saved some fictitious Credit card data into a document , I got the test card details here :

Test Credit Card Account Numbers (paypalobjects.com)

Here is a screenshot of my Document :


For this we will use text regex string to identify credit card number lengths & types

Regex, short for regular expression, is a powerful tool for matching patterns in text. It's a sequence of characters that forms a search pattern.

There are many of these patterns already that you can use to identify different Types of Card number sequences , so don’t be overwhelmed by the strings.

---------------------------------------------------------------

rule TestCreditCardNumbers {
  meta:
    author = "Ian Engelbrecht"
    description = "Detects test credit card account numbers"
 
  strings:
        $amex = /\b37\d{13}\b/
        $mastercard = /\b5[1-5]\d{14}\b/
        $visa = /\b(4\d{12}(\d{3}))\b/
        $dinersclub = /\b(3(0[0-5]|[68][0-9])\d{11})\b/
        $discover = /\b((6011\d{12}|65\d{14}))\b/
        $jcb = /\b((35\d{14}|2131\d{11}|1800\d{11}))\b/
  condition:
        1 of them
}

---------------------------------------------------------------

Lets Break Down this rule :

 Rule Name:

TestCreditCardNumbers: The name of the YARA rule.

Metadata Section:

meta: Contains metadata about the rule.

author = "Ian engelbrecht": Specifies the author of the rule.

description = "Detects test credit card account numbers": Provides a brief description of the rule's purpose.

Strings Section:

Defines several regular expressions ( regex) that represent different patterns for credit card numbers. Each pattern corresponds to a specific credit card type:

$amex_pattern: American Express

$mastercard_pattern: MasterCard

$visa_pattern: Visa

$diners_club_pattern: Diners Club

$discover_pattern: Discover

$jcb_pattern: JCB

Condition Section:

condition: Specifies the conditions that must be met for the rule to trigger.

1 of ($amex_pattern,  $diners_club_pattern, $discover_pattern, $jcb_pattern, $mastercard_pattern, $visa_pattern): The rule triggers if at least one of the credit card patterns is found in the analyzed files.


Here is a match Based on this rule :


Lets do another, I have a User document with user information :

I generated this fake data with ( scary site just by way )

Generate a Random Name - Male, American, United States - Fake Name Generator

Here is a screenshot of my document:


Ok so lets generate a Yara rule for this data, as far as PII (Personal Identifiable Information) is concerned I only really want to find 3 things here , Email , Phone Number & social security number.

We will use Regex for all 3 :

---------------------------------------------------------------

rule GenericUserDataPatterns {

  meta:

    author = "Ian Engelbrecht"

    description = "Detects generic patterns for email, phone number, and SSN"

  strings:

    $email_pattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/

    $phone_pattern = /\b\d{3}-\d{3}-\d{4}\b/

    $ssn_pattern = /\b\d{3}-\d{2}-\d{4}\b/

  condition:

    all of ($email_pattern, $phone_pattern, $ssn_pattern)

}

---------------------------------------------------------------

I want my condition here to be all , meaning document must have Phone number + Email + SSN for it to flag .

Let look at the rule :

Rule Name:

GenericUserDataPatterns: The name of the YARA rule.

Metadata Section:

meta: Contains metadata about the rule.

author = "Ian engelbrecht": Specifies the author of the rule.

description = "Detects generic patterns for email, phone number, and SSN": Provides a brief description of the rule's purpose.

Strings Section:

Defines three regular expressions:

$email_pattern: Matches a generic pattern for email addresses.

$phone_pattern: Matches a generic pattern for phone numbers in the format ###-###-####.

$ssn_pattern: Matches a generic pattern for Social Security Numbers (SSN) in the format ###-##-####.

Condition Section:

condition: Specifies the conditions that must be met for the rule to trigger.

all of ($email_pattern, $phone_pattern, $ssn_pattern): The rule triggers if all three patterns are found in the analyzed files.

Here is a match on all strings under same rule :


How do we handle Multiple rules: ?

As you can see as you begin writing rules you might end up with a lot of different rules to do various things & running each rule individually can be time consuming , here you have some options.

Firstly you can combine rules into one Yara file IE :

---------------------------------------------------------------

rule GenericUserDataPatterns {

  meta:

    author = "Ian Engelbrecht"

    description = "Detects generic patterns for email, phone number, and SSN"

  strings:

    $email_pattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/

    $phone_pattern = /\b\d{3}-\d{3}-\d{4}\b/

    $ssn_pattern = /\b\d{3}-\d{2}-\d{4}\b/

  condition:

    all of ($email_pattern, $phone_pattern, $ssn_pattern)

}

rule TestCreditCardNumbers {

  meta:

    author = "Ian Engelbrecht"

    description = "Detects test credit card account numbers"

 

       strings:

             $amex4 = /\b37\d{13}\b/

             $mastercard = /\b5[1-5]\d{14}\b/

             $visa = /\b(4\d{12}(\d{3}))\b/

             $dinersclub = /\b(3(0[0-5]|[68][0-9])\d{11})\b/

             $discover = /\b((6011\d{12}|65\d{14}))\b/

             $jcb = /\b((35\d{14}|2131\d{11}|1800\d{11}))\b/

       condition:

             1 of them

}

---------------------------------------------------------------

This single YARA rule file includes both the GenericUserDataPatterns rule for detecting generic patterns in user data and the TestCreditCardNumbers rule for detecting specific test credit card numbers. You can use this file for analyzing files and identifying patterns related to both scenarios under the same scan.

See here match :


The Second Option

The problem with this a approach is your Yara file can be come some what lengthy.

Here we use Include almost like Nesting both rues into a master rule & then meeting the conditions of both rules together :

---------------------------------------------------------------

include "C:\Yara\PVTGenericUserDataPatterns.yar"

include "C:\Yara\PVTTestCreditCardNumbers.yar"


rule CombinedRules {

  meta:

    description = "Master Rule Combining GenericUserDataPatterns and TestCreditCardNumbers"

  condition:

    GenericUserDataPatterns and TestCreditCardNumbers

}

---------------------------------------------------------------

To Accomplish this both GenericUserDataPatterns rule and TestCreditCardNumbers rule must both have their rules defined as private rule , or they wont be used together and they will match separately , to only use a combined condition so a file must have both GenericUserDataPatterns & TestCreditCardNumbers the rules must have private definition. If not it will match for either or.

Example of what I mean by private definition :

---------------------------------------------------------------

private rule GenericUserDataPatterns {

  meta:

    author = "Ian Engelbrecht"

    description = "Detects generic patterns for email, phone number, and SSN"

  strings:

    $email_pattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/

    $phone_pattern = /\b\d{3}-\d{3}-\d{4}\b/

    $ssn_pattern = /\b\d{3}-\d{2}-\d{4}\b/

  condition:

    all of ($email_pattern, $phone_pattern, $ssn_pattern)

}

---------------------------------------------------------------

From YARA documentation here :

Writing YARA rules — yara 4.4.0 documentation

All strings in YARA can be marked as private which means they will never be included in the output of YARA. They are treated as normal strings everywhere else, so you can still use them as you wish in the condition

So we Hide their output and Create a top level rule that will be in output which is :

---------------------------------------------------------------

rule CombinedRules {

  meta:
    description = "Master Rule Combining GenericUserDataPatterns and TestCreditCardNumbers"

  condition:
    GenericUserDataPatterns and TestCreditCardNumbers
}

---------------------------------------------------------------

Both rules GenericUserDataPatterns & TestCreditCardNumbers are then used as strings in the CombinedRules Rule.

Here is the match output :


This RTF file has both a type of credit card number and PII ( email , phone number etc )

Automated Testing with GitHub and YARA-CI:

Now, let's talk automation. GitHub workflows and YARA-CI bring efficiency to rule testing that ensures your rules are battle-ready when you need them.

What do i mean Battle Ready ? let's make sure we need getting false positives and our rule structures have no error :

The above rules I created are in a folder on my PC & I'm going to push the .YAR files into my github Repository where I already have YARA-CI installed.

Installation | YARA-CI (virustotal.com)


Let’s push the YAR rules using git on my local machine into my repository.


In Github , I can see all my YARA rules have been pushed:


Notice the Error icon , this is an automation task with failure.


Mostly the scan has checked my rules and has indicated that the Regex that I’m using to find PII & Credit Card data could slow down the scan.

See here screenshot.


Further checks we can see my rules were run against Virus total data set National Software Reference Library (NSRL) and we can then see if our rule needs refinement due to false positives or false negatives.

We Can see some false positives detected via Virus Total Yara CI :


This is expected as I was just trying to match a against single strings as example that could exist in other executables, it was in no way unique.

When we follow the file signature we can see it’s a DLL , that I matched with HEX & Unicode string , Remember ?

---------------------------------------------------------------

 $hex_unicode_string = { 48 00 69 00 74 00 20 00 61 00 6E 00 79 00 20 00 6B 00 65 00 79 00 20 00 74 00 6F 00 20 00 65 00 78 00 69 00 74 00 2E 00 }
    $exit_string = "Hit any key to exit..." wide

---------------------------------------------------------------

Virus total also deems this clean via other security products , again this is expected we did this purely to demonstrate PE & HEX tools to find strings you can then identify with Yara.


So this tool helps where Theory meet Reality its then hands-on with the technicalities of testing and configuring YARA rules against real-world data sets. From VirusTotal this helps to validate your rules against the challenges of the digital landscape.

The National Software Reference Library (NSRL) helps as this is known good software files, so if our rule is matching against this it generally means Yes we have false positive, this is a great way to Retro Hunt with your rule before you actually hunt.

So let’s Recap :

We explored the art of crafting YARA rules for cyber resilience. YARA is a powerful tool in the arsenal of cybersecurity professionals, offering customizable patterns to identify malware, targeted attacks, and security threats tailored to specific environments. We compared YARA to traditional antivirus tools, emphasizing its advantage in detecting zero-day threats where traditional signatures may fall short.

Deconstructing a YARA rule involves defining key elements:

Rule Name: A clear identifier for the rule.

Metadata: Additional information about the rule, such as author and description.

Strings: Patterns or sequences of characters to search for in target files.

Text Strings: Simple character sequences.

Hex Strings: Byte sequences in hexadecimal notation.

The condition, a logical expression, determines when the rule triggers. We demonstrated rule creation using examples, showcasing the use of text and hex strings.

Beyond threat hunting, YARA proves versatile for data classification. We created rules to identify credit card numbers and generic user data patterns using regular expressions.

Handling multiple rules efficiently was discussed. Combining rules into a single YARA file or using "include" statements with a master rule provided strategies for managing multiple rule sets.

With a focus on automated testing using GitHub workflows and YARA-CI. We pushed YARA rules to a GitHub repository, and automated testing flagged potential issues, allowing for refinement and validation against real-world datasets like VirusTotal and the National Software Reference Library.

Conclusion:

YARA rules are a potent weapon in the cybersecurity arsenal, enabling proactive detection of threats and robust data classification. Crafting effective rules requires a deep understanding of the rule components, and automated testing with tools like YARA-CI ensures the rules are battle-ready in the dynamic landscape of cybersecurity. As the digital landscape evolves, the continuous refinement and testing of YARA rules against real-world scenarios become critical for maintaining a resilient defense against emerging threats. 

---------------------------------------------------------------

Thank you for reading if you got this far, please leave a comment or share. 

---------------------------------------------------------------

Comments