Shannon Entropy in Splunk
You might hear the term Entropy in thermodynamics (which is basically means, how quickly particles in an object are moving). But today we will try to investigate Information Entropy or Entropy in Computer Science.
What is Entropy?
In a simple word, entropy means “calculation of randomness within a variable”. In most cases, the entropy of a string or variable is calculated using the “Shannon Entropy Formula” introduced by Claude Shannon in 1948.
The entropy of a string or URL is nothing but a measurement of randomness. It will provide us the entropy score of that string, entropy score and randomness is directly proportional to each other. That means the more random a string is, the higher its calculation of randomness.
Why do we need to calculate entropy?
These days lots of web exploits and malicious activity is happening using URLs. Most importantly these domains or sub-domains are being created by DGA or domain generation algorithm. DGA is a technique that will create random domain names for those malicious activities. That’s why we have this entropy calculation technique to calculate randomness within a URL to block those domains which are harmful to your network.
How to calculate Entropy?
Well if you think that you need to use this formula, then it will take a whole day to calculate the entropy of one URL. Don’t worry about the formula, that formula meant as much to you as it did to me.
For this, we have an add-on named “URL Toolbox”. This is available in Splunk base. Please download the add-on from the following link as shown below.
https://splunkbase.splunk.com/app/2734/
Next Log in to your Splunk instance with your credentials.
After that click on the Gear Sign, to access manage apps and click on “Install App From File”.
Example:
We have the top one million most viewed website lists in our index. So We will try to calculate the randomness or entropy score of those URLs, using the above method.
Well, you don’t need to do anything to calculate entropy-score, just use this query with your data and it will work.
index="sample_index" sourcetype="top_url"
| table rank url
| sort rank
| `ut_shannon(url)`
| table rank url ut_shannon
Result:
Explanation:
Here ut_shannon is the field that is showing the entropy-score of that particular URL. If you want to know how it’s done you need to go through the python script available in the “URL Toolbox” app in the following path $SPLUNK_HOME/etc/apps/utbox/bin.
The macro “ut_shannon(1)” we are using here, comes with the add-on automatically. In place of the argument field within the macro, you will use the field that contains the URL from your index. Like that macro, we have multiple lookups and macros available in that add-on. Based on your requirement you can use those.
Summary:
Regular URLs fall within entropy score 2 to 4, if you have a higher score of more than 4 then the randomness of that URL is higher.
Out of my one million data, we have only found 5 URLs which has an entropy score of more than 4. That means these URLs can be harmful to our network.
Hope you have enjoyed this blog “Shannon Entropy in Splunk“, we will come back with new topics of Splunk. Above all, goodbye and stay safe and strong.
Happy Splunking !!