How To Backfill In Summary Index ( How To Manage Summary Index Gaps In Splunk )
Hello guys !!
Hope you are enjoying these blog posts. Today we have come with a new and interesting topic of Splunk. Before going to the topic we will let you know a brief of Summary Indexing. The summary index is a special type of index which stores the data of a scheduled report. It helps you to run a query very faster over a large set of data. Because in summary index we will put data which will be used for the future purpose by scheduling report.
Now as data in the summary index is being populated by a schedule report so there may be a data gap if the schedule report skipped any time. Also, there is another reason for the data gap is if Splunk goes down for some time and schedule time for a particular schedule report falls in between the downtime of Splunk.
So today we will let you know how to fill up these gaps in the summary index.
Step 1:
Let’s create a report using data of a base search. Write a query and run the query. After that click on Save As. Now click on Report to create a report.
Step 2:
Give a report name. We have given the report name as Back_Fill_Summary_Index. Click on Save to save the report.
Step 3:
Click on Settings and then click on Searches, reports, and alerts.
Step 4:
Find your report.
Click on Edit and then click on Edit Schedule to schedule the report. Tick the Schedule Report option. We have given here CRON expression as
25,30,35,40 5 * * *
So it’ll run every day morning at 5:25, 5:30, 5:35 and 5:40. After that click on Save to save the changes.
Step 5:
Now for enabling summary indexing again click on Edit and then click on Edit Summary Indexing.
Step 6:
Tick the Enable Summary Indexing option and chose your summary index where you want to store the data of this schedule report. Here we have selected an index called backfill_summary. Click on Save to save the changes. So the data of the scheduled report will be saved into the specified summary index.
This index has already been created.
Step 7:
Now when we checked the data around 8 am we were not getting data for a particular time. Probably Splunk was down at that time or something happened wrong that’s the reason we are not getting data. This is called a gap in the summary index. And in the data, we are getting event time as 2019-06-08 05:20:00 for the oldest event. Though we had scheduled the report to be run at 5:25. This is happening because we have given earliest as -5m@m so always it will take the last five mins data so it will take the earliest event’s time as _time ( event time). But as you can see we are not getting any data for schedule time at 5:30. Again we are getting data for the scheduled time of 5:35 and 5:40. Something happened at around 5:30 that’s why search is being skipped for that time. Now we will show you how to fill this gap.
Step 8:
For filling this gap you have to access the CLI of this search head. Go to CLI and go the bin directory of Splunk.
#cd $SPLUNK_HOME/bin
See the list of all files and folders inside the bin directory.
#ls
You can see a script called fill_summary_index.py. So we will use this python script to fill the summary index gap.
Step 9:
Now run the below command to fill the gap in the summary index.
#./splunk cmd python fill_summary_index.py -app <app_name> -name "<schedule_search_name>" -et <relative_time> -lt <relative_time> -index <index_name> -showprogress true -j <number> -dedup true -owner <owner_of_schedule_search> -auth <user_name>:<password>
Here we have given app name as search and schedule search name as Back_Fill_Summary_Index. Also, you have to give a relative time in the earliest and lastest in which range we are not getting data. So try to give max one hour buffer time at the earliest and latest. We have specified index name as backfill_summary where we want to put the data. By default, data will go to the index called summary. We have put showprogress is equal to true to see the progress of the task. Here j indicates how many con-currents you want to run at a time. Try not to give this value more than 10 otherwise it may slow down processes that are running on the front end. Put dedup is equal to true to see where the data is already present into the summary index or not. You have to specify the owner name who had created the particular search. Then put the user name and password to run this command.
You can see the progress of the job. Once it is done check the summary index. So it will fill the gap in the summary index.
Step 10:
The script has filled the gap in the summary index. Now you can see we are getting the data that was missing previously. You can fill the gap using collect command but you may get duplicate events in the summary index. So better approach is to do the backfill the gap from CLI.
Hope, this has helped you in achieving the below requirement without fail:
How To Backfill In Summary Index ( How To Manage Summary Index Gaps In Splunk )
Happy Splunking !!