Insights - The realities of the problems faced when blending Systems & Platform Engineering with Cyber Security Teams

... Also known as "DevSecOps" and "Security Engineering".

You get to the point where experience tells us "there are going to be major issues uncovered here" upon starting a Splunk gig.

The first sign of (let's put it bluntly) "bad practice" needing undoing (redemption?) - is when one goes to log in to the customers platform and first thing visible is an SSL certificate warning on the browser, this is a sure sign of worse things to come.  This is just from an observational point of view - when it comes to being notified e.g. "oh, there are a couple of on-boarding tasks that need [finishing]", issues start to appear quickly.

The impending on-boarding of data seems like some kind of subject that should be minimised?

This happened some time ago, but the echoes of these bells are ringing all too often.

On a personal level, we would say that for every obvious issue, there is another issue nobody seems to have discovered that is just as bad, also likely that if known about (they've hidden it), the team have just adapted to the lower-levels they have found themselves living in, it's acceptance of bad things, sometimes with a workaround, sometimes no workaround available - often needing expert atttention.

Analysts are "analysts", managers are "managers" and engineers are "Engineers".  In environments where analysts and managers are performing engineering tasks, there is never a great outcome.  However, in some instances, we've seen that a star analyst has taken the SIEM product on, made something great out of it, then moved on...  Leaving major technical debt behind.

 

"The Guy Left the Business!"

This is usually the primary reason!  Many of the journeys we embark upon, are wearing dead mans boots.

Some of the problems we have got our customers out of due to this exact issue:-

Problem Description
No replication

This turned out to be what we termed as a "counterfeit cluster".

Data was distributed over 4 nodes in a 1:4 ratio.  1 node loss would result in 25% of the data going missing! Nobody had a clue about this risk.

A huge mess we resolved by converting to a multisite cluster, manual bucket migrations were required.

Sysmon crashing

A sysmon configuration was downloaded from the internet, fully tested and working.

Later it was found that the sysmon configuration had actually been modified (by managers & analysts) incorrectly.

Inappropriate puppet usage Instead of using a cluster manager, puppet was being used to deploy bundles to the indexers, with a hard stack restart applied, resulting in irrecoverable bucket corruption.
Cloud Syslog Delinea cloud have a syslog transmitter that works over the internet, someone had configured this to use UDP and aim at a local 10. address.
THP (Linux) Unconfigured Transparent Huge Pages being overlooked results in lower performance and system stability issues.
Wrong Use Case Event Code

Account creation and deletion events were incorrectly set in the SPL, resulting in alerts for other things.

Amazingly, analysts were closing notables as "false positives".

Fake Syslog over TLS

A customer had paid for PS services to encrypt checkpoint firewalls using log exporter and TLS.

The "engineer" who had performed this work, did not complete the work, was paid, then left.

 

We are extremely transparent yet have to remain discreet, as these sorts of issues can cause big trouble when uncovered.  The main goal is of course is to ensure the job gets done and done properly!

Do you have problems with your suppliers, your staff (or lack of) or your MSSP?  Contact Us.

Add comment