A few years ago, like many others around the world, I read Yuval Harari’s Sapiens book. It is a fascinating look at human history, from the perspective of anthropology, sociology and economics. Also, as many others in the computer science field, I have been thinking and working with what Andrej Karpathy dubbed “Software 2.0”, a bold and pragmatic view of how machine learning is transforming our industry. Sipping from ideas of these writings and my own experience, I decided to write about Knowledge 4.0, an idea that has been in my mind for a while.

Intelligence has always been somewhat…


“If I have seen further,” Isaac Newton wrote in a 1675 letter to fellow scientist Robert Hooke, “it is by standing on the shoulders of giants”, referring to the work of Descartes and Hooke which he built upon.

Why Bother

Writing software is a very expensive process, and most systems we interact with today, as technologists or even as regular consumers, are the product of thousands of years of work by different individuals all across the world.

For example, the Linux kernel alone is estimated to have cost USD 1.4B dollars. Developing a full Linux distro is said to cost 60,000 Person-Years or 10.8B dollars. This is more money than the combined venture funding received by Latin America tech companies ever. …


About an year and a half ago, Loggi backend codebase mostly consisted of a large Django application, and several "nanoservices", with a wide range of stacks that never got traction. Rust, Elixir, C++, Nodejs, you name it.

We felt that our Django stack served a large class of applications very well, but the company was starting to write stuff that did no fit that preferred stack paradigm well, and we needed a new, different and well maintained stack in our toolbelt.

So, as small, medium and large decisions start, we wrote a design doc, which I am now sharing in…


Since I joined Loggi, we have been exploring some techniques for a more data-driven decision making process. In this document I describe some of those, from the perspective of what they bring to the table and their limitations.

The Hierarchy of Evidence is a well established classification of knowledge gathering mechanisms which bring different levels of assertiveness to the decision making process. As you go up in the pyramid, the cost of generating data increases, as does the confidence in the information. …


If you studied machine learning in college or through some tutorials in the internet, you probably have this feeling that the problems solved by it are all similar in some sense. Detecting people in photos, classifying sentiment in a text. It always feels like some task that an unhappy employee is doing in a far away country.

But when the theory books start to discuss how some machine learning technique captures any function, this often feels a bit disconnected from reality. Those tasks are not that close to what we computer scientists usually describe as functions. There are some nice…


I had some fun and challenges with the unusual choice of picking Spark ML as my primary building block in one of Kaggle competitions. Read on to learn more.

The competition problem statement was to detect duplicate Quora questions, which involves a lot of text analysis, an area we at WorldSense like a lot. The solution I attempted leverages only algorithms readily available in Spark ML, and is a simple pipeline with data cleaning, TF-IDF, LDA, and a linear classifier.

The tools of the trade in these competitions are invariably from the python ecosystem. The completeness of the machine learning…


Web scale computing has never been so simple

I work at WorldSense, where we build predictors for the best links you could add in your content by creating large language models from the World Wide Web. In the open source world, no tool is better suited for that kind of mass (hyper)text analysis than Apache Spark, and I wanted to share how we set it up and run it on the cloud, so you can give it a try.

Spark is a distributed system, and as any similar system, it has a somewhat demanding configuration. There is a plethora of ways of running Spark, and in this post…

Davi de Castro Reis

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store