Pulling The Spunk Out Of Splunk And Taking The L
A surface-level look at Splunk Enterprise, The Search Processing Language, and Ponies
I hope that this is the start of many, many Splunk posts…because Splunk is a pretty big thing. I couldn’t just clone a git repository, run some out-of-the-box Google examples, and briefly explain what was going on like I had in my gRPC post.
When I wrote my introductory post on Java, I thought maybe we’d follow up by doing an ongoing Minecraft mod; when I wrote my introductory post on gRPC, I thought maybe we’d follow up by building a full stack application with grpc-web, then compare it to a more conventional stack like MEVN. So, since my track record isn’t great so far, consider this “Generic Splunk Post, Part 1/?.” I will keep ongoing project ideas going in a queue.
The Short Version
Splunk is a company that produces software for searching, monitoring, and analyzing machine-generated data via a Web-style interface (source). The core product, Splunk Enterprise, is not open source and is also fairly expensive. If you want a taste, you can get a free trial from here and some sample data from here for a made-up game they called ButterCupGames (*groans*). They actually made it a game, but it’s just a blatant ripoff of Flappy Bird. Why didn’t they just make a game where users inputted a picture of themselves, five unique passwords, and as much information as possible about their personalities? Now THAT is a game I would play.
This is the start of their official YouTube tutorial for Splunk Enterprise. This is the TutotialsPoint reference for their Search Processing Language, which interestingly comes up before their official guide — indicating that Google users probably prefer to read a coherent tutorial than a glossary.
Look at it. It’s beautiful. Also note the localhost URL…it’s running on this computer.
What will follow in this post is a pretty scattered foray into Splunk — we’ll revisit a Splunk app we made in 2015, try to break down a simple search query, then very briefly have a look at an open source Splunk cybersecurity tool. Finally, we’ll make a lot of puns and try to convince the company to send us a free t-shirt.
Facing My Dark Past And Demons
I knew this day would come…we are going to face my demons, on some other day and not today. That old app from 2015 is apparently no longer supported, but write that down in the Project Idea Queue: Resurrect TwitterDiseaseTracker.
TwitterDiseaseTracker was a kind of proof of concept Splunk application based on Maria’s idea to replicate a Johns Hopkins study, grab a massive stream of Twitter data, and see if we could reliably model disease trends simply by using information people were Tweeting. The underlying idea was fantastic, but what this app actually was, was…”a good start,” as the reviewers of the app contest diplomatically put it. I don’t think XML is a particularly smart way to share a project (where is the parser for it?), but I do see it had this:
<query>sourcetype=$diseasesourcetype$ coordinates.type=Point | rename coordinates.coordinates{} as lnglat | eval lat=mvindex(lnglat,1) | eval lng=mvindex(lnglat,0) | geostats latfield=lat longfield=lng count</query>
— I refuse to make a Gist for a single query tag
That kind of looks like the Splunk Search Processing Language embedded in a tag. Would it be hard to download this thing from the archives and bring it back to life? Considering it’s now deprecated, would it even be possible without re-making the thing? The app was basically just mapping symptom words to diseases, then mapping said diseases based on Tweet origin. There was no complex algorithm, no machine learning, no comparison study in which we took our work and showed Splunk is a reliable way to achieve similar results. In fact, how would we do that? Even a good faith attempt to do something like that would make for a good continued discussion — but it’s easier said than done.
And using Splunk on Twitter sounds like a pretty good use case to me. It’s almost like the two were made for each other: A search application for massive amounts of data, and a seemingly infinite source of data that is as random, unpredictable, and nonsensical as humanity itself.
Okay, I looked it up.
Sourcetype (https://docs.splunk.com/Splexicon:Sourcetype): Identifies the data structure of the event
Rename: Does what you would expect
Eval (https://docs.splunk.com/Documentation/SCS/current/SearchReference/EvalCommandExamples): Carry out a calculation
Geostats (https://docs.splunk.com/Documentation/Splunk/9.0.0/SearchReference/Geostats): Now we’re cooking with oil. This is the command that makes it possible to generate statistics about geographic data.
I am still not sure if this is 1:1 to the Search Processing Language, but it sure looks like it. There has to be more going on, though…what was the data source? How was it charting symptoms? What was it achieving that we could not also achieve with some rudimentary Python script and ArcGIS? I recall that with Splunk, we were pulling in thousands of results a second, real-time. What kind of speeds would we get without Splunk?
Personal Project Queue: Do TwitterDiseaseTracker without Splunk, but for COVID-19. I just messaged Rena, my old teammate: Dude, I’m looking at Twitter Disease Tracker, our old friend. What if it had worked, and we stopped COVID-19? Think about that.
Sorry, that’s not a good thing to joke about. But they say laughter is the best medicine. I think.
Why Does Everyone Keep Talking About Splunk?
I have Security+ certification, so I am something of a cybersecurity expert.
That’s odd…I just sensed a disturbance in the force. It is as if 12,000 people on r/netsec laughed at the sentence I just wrote.
Splunk came up in the Security+ bootcamp as “the instructor’s favorite SIEM.” What is a SIEM? I passed the test and cannot remember. From the pony’s mouth:
Security information and event management (SIEM) is a single security management system that offers full visibility into activity within your network — which empowers you to respond to threats in real time
— Source
Okay, I don’t want to go too deep into this part, because it looks like a huge area. That would probably make for an interesting project, too, because cybersecurity is infinitely more interesting than Twitter.
Here is an open source cybersecurity application Splunk has built — it could be a great window into what Splunk code is actually like, and how the software engineers think. Personal Project Queue: Get attack_range to work and blog about it. I hope it’s as easy to work with as the log4j exploit PoC, but it’s probably apples and oranges.
Closing Thoughts
Splunk is an extremely popular, relatively expensive tool, like…I don’t know…a private jet. We want to learn to fly, but we don’t have a few million dollars lying around to buy one. What we CAN get, if we want to understand the experience, is a flight simulator. So that little Splunk instance running on toy data at location localhost could be like a flight simulator.
And their open source work could be another hook.
They also have a free YouTube tutorial series starting with the above, a technical writing team I am eager to blog about, and apparently their own podcast hosted by Hal, who may or may not be the evil Halr9000 hellbent on completing his mission regardless of what it takes to do so.
If I get deep enough into this, maybe I will also come up with a better name than “Pulling The Spunk Out Of Splunk And Taking The L.” The working title was “Splunking The Caves Of Splunk,” but spelunk is actually what their name is a parody of. I used to have a joke: SplunkDocs — We make sure spelunk is spelled incorrectly.
I also really wanted them to make a T-shirt that said: Splunkterns — Putting The Spunk In Splunk. Maybe if I say it enough times, eventually I will be motivated to produce my own T-shirt, and I will get sued by them. They I will truly be sunk by Splunk.
Sunk by Splunk…unsunk by splunk. Okay, that’s another title idea.