INDEX TIME FIELD EXTRACTION USING WRITE_META
In this post we decided to cover a very common but little tricky Splunk configuration, implementing index time field extraction.
Although, it’s almost always better to prefer “search-time extraction” over “index-time extraction”.
To know more about “index-time” and “search-time” extraction, please click here.
To implement index-time extraction you can just follow the below tried and tested steps.
On the Indexer:
Step-1 : We created an index “test” to store the data that we are going to use for the testing purpose.
You can simply create an index via Splunk GUI, in case you are on a non-clustered indexer.
On the Heavy Forwarder:
Step-2 : We created a file named sample_logs.txt under /tmp directory, the contents of which you can see in the Screenshot below.
Step-3: We created an inputs.conf (under, $SPLUNK_HOME/etc/system/local) to monitor the “sample_logs.txt” file.
[monitor:///tmp/sample_logs.txt]
index = test
sourcetype = test_file
The above stanza tells the Splunk input processor to monitor a file “sample_logs.txt” located under the /tmp directory and the attributes “index” and “sourcetype” assign the values to the default fields “index” and “sourcetype” required by splunk.
Step-4: We created a props.conf (under, $SPLUNK_HOME/etc/system/local), here we must put the entry of our transforms.conf stanza , basically, you need to declare the stanza(s) to be used in transforms.conf here.
[test_file] a Sourcetype of the logs to be ingested
SHOULD_LINEMERGE = False
This attribute doesn’t let multiple lines form the logs to merge together into a single event.
TRANSFORMS-demo = my_extraction
The above attribute is required for doing index time operation(s),
It follows the format: TRANSFORMS-<class_name> = <transformation_name>
In our case this “transformation_name” is “my_extraction”, which we are going to define in transforms.conf.
You can have multiple “transformation_name” in a comma separated fashion.
Step-5: Let’s create the transforms.conf (under, $SPLUNK_HOME/etc/system/local), here we are going to set the index time extraction rules.
[my_extraction] a This stanza must be declared in the props.conf file.
REGEX = ………
You need to set your regular expression based on the incoming events to Splunk, above attribute sets the regular expression based on which the index time extraction(s) is going to take place
For ex. in our case,
The events look like this –
May 28 16:04:10 server2 passwd[30246]: password for ‘avahi’ changed by ‘root’
And we wanted to extract the operation (passwd) and the user name (avahi) from all these events.
So, we can have a regex as below-
(\w+)\[\d+\]\:.*?\'(\w+)\’
We have two capture groups in the regular expression.(highlighted in yellow in the screenshot below)
WRITE_META = true
This attribute in fact here tells Splunk to do index time processing for this transforms stanza, it is only valid for index-time field extractions.
NOTE: You can either use “WRITE_META = true” OR “DEST_KEY = _meta”
the former adds the extracted fields automatically to metadata
while for the later one you need to explicitly mention “_meta”
for the fields to get added to metadata.
FORMAT = operation::$1 user::$2
This specifies the output field format, in this case whatever is captured by group 1(indicated by $1) is put under the field “operation”, and similarly anything captured by group2 is put under the field “user”.
Step-6: Let’s look for these extractions on the search head.
Step-7: Wait!! Let’s make sure that these fields got added to the index file (tsidx file).
The index file or the tsidx file holds all the metadata fields, so we should be able to query on our fields “operation” and “user” using the ‘tstats’ command.
That’s all, for this post, we hope you understood how to configure Splunk to do an index time extraction on your data.
Happy Splunking!!