Kafka Community Spotlight #4

1. Personal

Please tell us about yourself, and where you are from.

Viktor

I grew up in Orosháza which is a small town in the southern part of Hungary. I currently live in Budapest, Hungary with my wife.

How do you spend your free time? What are your hobbies?

I do many things. I either play guitar, lift some weights or play on PlayStation (FIFA, No Man’s Sky, Baldur’s Gate or Elden Ring recently). I’m also interested in finance, macroeconomics, so I often listen to podcasts or just dig the internet for new trade ideas. Oh, and I run our own smart home system with Home Assistant (on my own Ubuntu server), so lately I’ve been neck deep into smart home protocols and tech.

What’s a trade idea you have on your mind recently?

I have very recently rebalanced my portfolio actually:

  • 50% gold
  • 50% diversified basket of AI tech stocks I panic sold but higher

I think this perfectly captures a balanced, all-weather portfolio of a rock that nobody knows the actual value of and a basket of stocks with questionable economic utility to maximize disappointment “alpha”.

Any Social Media channels of yours we should be aware of?

  • LinkedIn - I actually don’t post anything there mostly, but one of my new year’s resolutions is to do more.
  • X (iamwattstrades) - occasionally repost fintwit memes there.
  • X (iamviktorwatts) - this was my main twitter account, although I haven’t been posting on it for a good while now.

What does your ideal weekend look like?

Depends on the weather. Bad weather: cave in, play PS5 or guitar. Good weather: go out with my wife for a walk or hike somewhere.

Last book you read? Or a book you want to recommend to readers?

  • Professional: Building an Event-Driven Data Mesh by Adam Bellemare. If you want to learn how to build resilient and modern data architectures, this is your book.
  • Finance: any book by Brent Donnelly (Art of Currency Trading or Alpha Trader). He is a brilliant writer and his books give you the experience of a real professional trader who worked at top financial firms.
  • Other: well, honestly I need to read more fiction books as lately I’ve been slacking on those. I like both sci-fi and fantasy and my next one on the list is the The Expanse book series.

Best type of music, best song?

I like a wide variety of genres, but deep down I’m a synthwave head. I like the vibe, the sound of the arpeggiator and the 80’s guitar (be it clean or distorted). Daft Punk, Kavinsky, John Carpenter, Dance with the Dead or Power Glove are what I listen to almost every day. I also like British style rock like Muse or Arctic Monkeys.

Favorite food? Best cuisine you’d recommend?

Well, I’m a vegetarian so anything without meat I’d recommend 🙂. My favourite cuisine though is probably Italian. I like their simple, yet delicious recipes, the wine and the cheese.

What is the best advice you ever got?

“Don’t be stupid.” - it’s incredibly powerful 🙂.

“You’re a programmer, you can do everything” - and indeed.

I studied at the University of Szeged, on a software design orientation on BSc and machine learning for MSc. I think it absolutely helped, mainly how mathematical analysis and the machine learning subjects formed my thinking.


2. Kafka

How did you get into Kafka?

I applied for a job at Cloudera in 2017. We were a small team of 2 maintaining the Cloudera fork of Apache Kafka.

What Version of Kafka did you start with?

I believe it was 0.11.

When do you think one ought to use Kafka?

It’s hard to say but given how many good services are available today from different vendors, I think as soon as the queueing technologies used feel restricted and a need arises for replayable event streams, people should start thinking about using Kafka.

Do you think Kafka has a high entry barrier?

If I were to discover Kafka today and would come from a different background than data engineering, probably yes. Software like Kafka are usually complicated: there is replication, partitions, failure tolerance, RAFT, tiered storage, just to mention the tip of the iceberg. If someone already has a background in distributed systems, I think it makes it much easier. Also, there are many great providers out there who can flatten the learning curve with managed services.

What’s the most annoying thing in Kafka you can think of?

It’s easily the command line tools. Once I actually wrote an interactive tool but I think it got lost over the years.

If you had a magic wand and could instantly and frictionlessly contribute/fix one thing to Kafka, what would it be?

Well, an interactive command line tool. And also, I think that sometimes it is the detriment of Kafka that it distances itself from plugins and tools. Certain connectors, tiered storage plugins or Cruise Control could somehow be part of the ecosystem.

How has Kafka changed over the years from your point of view?

I think it matured a lot. 4-5 years ago we had a lot of issues with the purgatory for instance and delayed requests getting into weird concurrency issues and we had to go on customer calls many times because of these issues. I think it got much more stable since then and I haven’t really seen these kinds of operational issues anymore.

What is the hardest thing you built in or around Kafka?

Well, I always was an all-around fixer, touching many parts of it in the core. From these I think the most challenging was working with the inter-broker requests or if I had to work with transactionality. In those parts it’s easy for bugs to slip in, mainly concurrency.

How would you suggest somebody become a Kafka committer?

Do thorough code reviews. At the beginning I often downloaded the patch, tried it out myself, ran the tests. This way I could make meaningful comments which I believe made a good impression on people. Then I also think it’s important to be active on the mailing lists and go to conferences, meetups so people can attach a face to the strange reviewer.

What’s the coolest part about being a committer?

Definitely the community. I like that we’re so enthusiastic about data streaming and the cool ideas that community members raise.

You had a talk about Cruise Control in 2023. What is your experience with it?

Everyone who self-hosts Kafka should use Cruise Control from the start. It may add to the learning curve, but it’ll come back to you with pagers that don’t happen. If someone has even 5 brokers, I think Cruise Control adds a lot of value in spared extra hours working on incidents.

What made you design the Multi-Tenant Kafka KIP?

It was actually on our idea board for a long time and on one hackathon we implemented a very restricted version of it. Then we decided to at least make a KIP out of it to see if it interests the community.

What are your most contrarian opinions on stream processing? What are the most mainstream ones?

Contrarian: not sure if this is that contrarian because I think many people have a similar opinion, but exactly-once isn’t usually needed and makes things much more complicated than they need to be. It’s very hard to verify if a message has been produced exactly once and most systems building on Kafka would handle it anyway.

Mainstream: it is good that Kafka unifies so many different use-cases in the streaming space. It makes big data architectures simpler and more robust.


3. General/Parting

How many Kafka Summits have you been to? How has the conference changed over the years?

Well, I’ve been to one every year for the past 8 years and I think one year I’ve been both in London and San Francisco. I definitely felt that it became different over the years. One thing is that it became much bigger, but also Confluent grew to a large company from a startup.

Unfortunately in the past 2 years in London at least it was also a little bit empty to be honest. I don’t know if it was a lack of interest, macroeconomic reasons (as many companies tightened their belts in the past 2 years) or that Confluent didn’t put much effort into marketing it.

What do you think about queues?

I don’t like standing in a queue 🙂.

From Kafka’s point of view, I haven’t been following this KIP too much, however overall I think this was a needed addition to Kafka. Of course it comes with a Kafkaesque spin on it as messages won’t be consumed as in a traditional queue, but I think adding this feature was crucial as it supports users who currently use a queue and would like to move to Kafka or users who have been using Kafka as a queue-like system, can now use it as an actual queue.

How do you see the future of Kafka usage and development, 5 years out?

I think that cloud readiness is a big focus point for the community right now. Kafka’s big problem is that it was designed for fixed hardware at a time where cloud environments weren’t yet so widespread.

Therefore you had to overprovision every cluster account for traffic spikes. This makes it much more expensive than it has to be. Initiatives like KIP-1150 would make it much cheaper to operate both for self-hosted users and for providers alike. With “lazy-log” like behavior on the producer side it could be fast too.

Then I hope I can see KIP-1134 (multi-tenancy) or some form of it implemented. Every conference I hear at least 2-3 conference presentations from companies who implemented their own multi-tenant system over Kafka to be able to provide a modern self-service solution to their internal customers. With KIP-1134 this would be much easier for them.

Finally, I hope that Kafka becomes much more schema aware and integrates with tools like Iceberg much better. KIP-1150 would open the gate, but Kafka still misses crucial things like Parquet support to integrate with datalakes better. In my opinion the biggest problem of the classic Kafka log format from the storage perspective is that it’s terrible for datalakes as it is row oriented, whereas most data lake technologies and warehouses use column oriented formats.

Do you think we’ve innovated in the messaging space in the last 10 years? How have you seen the space change?

Definitely. I actually started working on big data technologies like 10 years ago but then Kafka and the whole streaming space was much smaller and more diversified. RabbitMQ, ActiveMQ, NiFi, Flume and Sqoop even (both which have been discontinued since) dominated the space and distributed messaging itself wasn’t at the front. Although NiFi, RabbitMQ and ActiveMQ are still at the forefront of streaming and queueing technologies, Kafka gained a much larger audience because it handled messages in a distributed, replayable manner while it incorporated many features of the previously mentioned competitors.

Over the years new technologies grew up on the back of Kafka like Pulsar and RedPanda which caused more competition and more rapid feature development.

Now in more recent years partly due to there’s been a rejuvenation of different ideas and innovations around Kafka (Warpstream, AutoMQ) that would aim to learn from its shortcomings (cloud readiness, schema handling, better compatibility with data stores) that I think would bring more efficient, cheaper, better software to data engineers.

What other tech besides messaging do you have interest in?

Definitely Iceberg. I think it’s a very interesting tech and I would like to get to know the other side of the table so to speak (pun intended). Some of my ex-colleagues are working on adding custom file format support to it and I’m interested in if I can effectively onboard the Kafka log format. I know that it’s like Frankenstein’s monster, but it would be a fun experiment.

Anything else you’d like to add?

It was really fun answering these questions.