Index time Vs Search time Processing
Splunk Enterprise terms “index time” and “search time” distinguish between the ways of processing that occur during indexing and when search operations are being performed.
Index time: It is the time period from when Splunk receives new data to when the data is written to a Splunk index. Inbetween this time, the data is parsed into segments and events, default fields and timestamps are extracted, and transforms are applied.
Search time: It takes place when you search through data, Splunk creates those fields when compiling search results and does not store them in the index. When you are running a search, several operations are performed by Splunk to derive various knowledge objects and apply them to the events returned by the search. These knowledge objects include extracted fields, calculated fields etc.
Knowing about these terms becomes more important while administering Splunk Enterprise. For instance, assume that you are planning to use custom meta-data such as host and source type, then you should define that meta-data before you start indexing, so that the indexing process can tag related events with them. Once indexing has been done, you cannot change the meta-data assignments.
If you wish to apply your custom meta-data to the already indexed data, you can choose either to re-index the data, in order to apply the custom meta-data to the existing data, as well as to new data, or as an alternative, you can manage the issue at search time by tagging the events with alternate values.
Index time and search time Extraction
When Splunk is indexing data, it parses the data stream into a series of events, as a part of the processing, it also adds a number of fields to the event data. These fields comprise of the default fields that it adds automatically and any custom fields that you had specified.
The process of adding fields to events is known as field extraction. There are two types of field extraction:
-
Index-time field extraction: These fields are stored in the index and become part of the event data.
-
Search-time field extraction: This takes place when you search through your indexed data. Splunk creates those fields when compiling search results and does not store them in the index.
There are two types of indexed fields:
-
Default fields: These are the ones which Splunk automatically adds to each event.
-
Custom fields: The ones which you have specified.
NOTE: While working with fields, consider that most machine data either are not structured or have a structure that changes constantly. For this type of data, use search-time field extraction for maximum flexibility and reliability. Search-time field extractions can be easily modified even after you have defined it. The general rule as recommended by Splunk, it is better to perform most knowledge-building activities, such as field extraction, at search time. Index-time custom field extractions can cost performance at both index time and search time. Whenever a new field is added to the number of fields extracted during indexing, the indexing process slows, also search operations on that index become slower, due to the increased additional fields. To avoid such performance issues consider relying upon search-time field extractions whenever possible.
Types of field extraction
Splunk offers three field extraction types, namely inline, transform, and automatic key-value.
Extraction Type | Configuration location |
Inline extractions |
Inline extractions have EXTRACT- configurations in props.conf stanzas. |
Transform extractions |
These extractions have REPORT- name configurations defined in props.conf stanzas. Their props.conf configurations must reference field transform stanzas in transforms.conf. |
Automatic key-value extractions |
Automatic key-value extractions need to be configured in props.conf stanzas where KV_MODE is set to a valid value except none. |
Below is the list of processes carried out at Index time and Search time by Splunk.
Index-time Processes
These processes are executed between the point when the data is consumed from the source and the point when it is written to the disk(on indexer).
Following are the processes that occur during Index time:
>>Meta-data/default field extraction (such as host, source, source type,
and timestamp)
>>Static or dynamic host assignment for specific inputs
>>Default host assignment overrides
>>Source type customization
>>Custom index-time field extraction
>>Structured data field extraction
>>Event timestamping
>>Event line-breaking
>>Event segmentation (also occurs at search time)
Search-time Processes
Search-time processing happens while a search operation is going on, as events are gathered by the search.
Following are the processes that occur at search time:
>>Event segmentation (also occurs at index time)
>>Event type matching
>>Search-time field extraction (automatic and custom field extractions,
including multivalue fields and calculated fields)
>>Field aliasing
>>Addition of fields from lookups
>>Source type renaming
>>Tagging
We have ~4 indexes under app ‘test_data’ having their inputs.conf and props.conf location as –
/opt/splunk/etc/slave-apps/test_data/local/ – On Indexers
/opt/splunk/etc/master-apps/test_data/local/ – On Cluster Master
We are planning to disable json extraction at indexer level, to set INDEXED_EXTRACTIONS=none into props.conf for [source_type] under above locations.
Now, coming on to enabling extraction at search time, via KV_MODE, which we will have to put under props.conf on Search Head machine – – –
Shall we explicitly make props.conf under $SPLUNK_HOME\etc\system\local and give it a stanza like –
[source_type]
KV_MODE=json
Could you please update on above ?