Simo Virokannas

Writings and ramblings

The Information Problem

There’s a lot of information available for software developers.

On one frequently updated list of free programming books on GitHub, there are several thousand books listed for different programming languages and subjects, in almost 50 languages. And these are just free books, one list. Amazon returns over 40,000 results for the search “programming book”.

Reading a book every now and then is beneficial, and reading a paper book is much better than an electronic copy. There’s something about the tactile experience and having to turn the pages that aids the human brain in remembering better what is read.

But as a programmer, you end up sometimes spending more than half of your time reading and browsing through code and documentation rather than writing something yourself.

What’s the problem, then? Looks like this is another three-parter, but I’m calling it as a whole “the information problem”.

Problem 1: Everyone is right

There is no “one correct way to write software”.

The term reinventing the wheel is overused. It is often applied to writing a piece of code that someone else already wrote in some other project, sometimes even the same project, or in an open-source application or framework. Some people avoid reinventing the wheel to the extent that almost all of their code is dealing with transforming data between different frameworks.

A quick scan of a React application may show that just by including the base frameworks, your project now has over 4000 indirect dependencies.

For each simple action, there can be over a hundred wheels, not all of them round, and all equally correct, optimized and readable.

If you always look for a pre-made wheel you’ll almost always find one.

In Python package manager pip, some precompiled packages are actually called wheels.

However, there’s nothing wrong in writing your own function to add padding to the beginning of a string of text. There may be times when that’s a good call.

I’d like to provide some clarity to the last two sentences.

In 2016, there was a widely-documented incident with the npm package manager for Node.js, where a software engineer decided to remove a left-pad function he wrote from the public package repository. This led to wide-spread outages on many online platforms, as it was revealed that great many developers had searched for the first left-pad function they could find, and made it a dependency in their own code.

In summary, everyone has the right to an opinion, and everyone’s software does exactly what it is written to do. Everyone is right. Right?

Problem 2: Not everyone is right

…depending on who’s asking.

Somewhere between those hundreds of opinions, some may be simply wrong for your use case even though they’re correct in their own right.

A Lego wheel will work just fine on a Lego car, but if you mount it on a monster truck, it will fail when gravity does its thing.

This problem is much more elusive when programming. A simple sorting algorithm may be blazing fast when run against a thousand entries, but cave in under its own weight when given a hundred thousand. It can even surprise the developer.

There was a popular photo organizing application that handled up to tens of thousands of photos really well, providing a smooth user experience. But if you crossed a somewhat arbitrary threshold somewhere between a hundred and two hundred thousand, an internal process in the application started to fail. Each time when opening the program, it was writing an XML file with all the metadata for all the photos. This XML file started to approach 100 MB in size, which is a lot for just a text file, or at least for the method they were using to write that file. After reaching a certain number of photos, the file could no longer be written and the application would crash on launch.

All the photos were now no longer accessible through the user interface. All their metadata (when and where the pictures were taken) was in a separate database file and the photos themselves scattered inside a folder structure the application was able to understand but not made for humans.

The solution was to write a separate program that interpreted the data in the database file, match it up with the photos and rescue them to a separate folder.

This means you need to take into account not only how something is written, but also what was their particular goal when they were writing it.

Not everyone’s solution is right for what you need.

Problem 3: Everyone has a voice

Internet is great, but it also gives everyone a voice. Imagine a classroom without a teacher, and everyone has access to the whiteboard. In just a few minutes, the board is full of information, some by informed students, some not. How do you find out what’s relevant?

You could take that whiteboard and do this:

  1. Let each student go and pick a few items and rate them according to what they know
  2. Let students observe and rate each other as to how well they rate
  3. Choose the top-rating students, hand them different colored markers

Congratulations, you’ve created Stack Overflow.

This leads to two outcomes on a larger scale:

  1. If you’re asking about a well-established, understood process and language, there will be enough informed answers and readers to produce a good response with a high rating.
  2. If you’re asking about a new, little-understood or obscure thing, chances are there are no answers – or the good and bad answers have no difference in rating.

This means it’s hard to separate the knowledgeable answers from the misinformed ones by using any other metric than “are there a lot of answers”.

The solution to the Information Problem

There’s really no true solution to this problem. It’s a built-in characteristic of the age of information. But over time, you can learn to avoid the different problems.

  • Learn objective reading – even when you find a great-looking solution, keep looking. Find the differences in the different approaches and the reasons they were taken. When there’s only one answer to a problem, question it. If anyone would provide another one, even a wrong one, that would represent 50% of the answers.
  • Don’t make copy-pasting code a part of your learning routine. The last thing you want is to let someone else break your software while you don’t know why. Even when a solution is simple, write the letters and symbols yourself. This can help you to catch issues that might otherwise remain hidden in your particular use case.
  • Don’t be afraid of doing something different. If you’re doing a normal thing in a new way, that doesn’t mean you’re wrong, you just might be the first one to improve it in a long while.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.