Simple overview: This Tines/Splunk project receives alerts from Splunk running on a server crawling through access logs, user event logs, and nginx logs based on pre-set thresholds or events to trigger the alert.
Although Splunk provides adequate alerting actions out of the box, I decided to integrate their webhook alert function with Tines. Not only to be able to parse/format/control the data, but also to be able to control multiple sources of outputting the data to other services as needed. After getting Splunk running, the first step was to setup the server logs I wanted to monitor, as well as define conditions for what events to alert as well as thresholds/verbosity for alerting.
For now, I have determined I wanted to monitor actions such as SSH users connecting, sudo/su usage, IP connections being banned by Fail2Ban service, nginx access/traffic/errors, other various server side user events, as well as suspicious activity being logged in auth.log.
I believed these to be some of the basic/easy to set up logs to alert for, as well as also provide real world use. On top of that, I am able to customize (including REGEX replace content) in Tines to provide a better readable experience than just simply raw log/alert output.
For example, one of the alerts triggers whenever a user successfully connects to the server via SSH. Through additional parsing/formatting/customization within Tines:
I have the alert provide information such as a specific title of the alert, raw log output (includes server log timestamp, users, ips, methods, etc.), where the source of the log was, as well as the full link to view the event within the Splunk UI. Specifically for this alert, the log provides IP address, port, as well as the SHA256. Just to get some practice, I wanted to REGEX replace this potentially sensitive content with placeholders (as can be seen in the parsing image above).
This is what the output of that action in Tines (sent as a message to a private social channel where I log other server/service events):
Note that the triggered timestamp seems extremely delayed by the time it was actually sent to message… This is because I had re-emitted the previous event in Tines during testing of the REGEX replacement of data so I would not have to wait or manually force a new Splunk alert. This is why the timestamp seems like it alerted 25 minutes before it was actually sent. In reality, I’ve noticed usually less than 5 second delay, with almost no more than 10 seconds by the time I knew I triggered an event that should be alerting me.
Here is also a screenshot output of a ban alert:
I don’t fully have the customization/formatting exactly where it would be pleasant and quick to read yet, but as you can see, there is definitely adequate content to see if any action needs to be taken without being obnoxious to read. If one needed to see the full Splunk event, all they had to do was click that supplied hyperlink (if they had account access to my Splunk dashboard) and would take them to the relevant alerted event in Splunk, where the search/log could be more delved into and/or investigate any logs leading up to and after the alert.
The only current negatives I have with this project, is I need to determine what the optimal thresholds are to prevent over notification of meaningless events, or during times while I am heavily modifying the server (such as using sudo command a lot, or logging in out frequently). One I feel will cause too frequent of alerts is the ban log, as I currently receive every notice of a banned IP through Fail2Ban’s thresholds for banning/logging. There is also a workaround I could do to “temporary” disable logging of my own events while working in/on the server or sites, just have not got around to it.
Concepts that were used in this project: