Since the dawn of modern computing, files have been an integral concept in organizing and storing data. Today each device keeps internally track of hundreds of thousands or millions of files. This sheer quantity means that files perhaps are no longer an ideal way to either organize or store information. But is there anything to take its place and is a transition possible?
Begin of File [BOF]
The concept and term “file” was born long before modern computing. When large amounts of data needed to be recorded in offices, they were usually either typed or handwritten onto sheets of paper that were stored in a file, with a label sticking out at top or side, and organized in hanging folders inside pull-out drawers. A smaller form factor, “card file” could be on an office desk for quick flip-through, often alphabetized for faster lookup.
Early computing
The physical file lent its form to most forms of even the earliest computers. Any storage had a beginning and length, so the typical solution was to provide an index at the beginning of storage, with at least two pieces of information per file: A name and starting position.
Occasionally this was optional, for example the case of a C64 cassette tape: You could either hand-write starting tape counter positions of each program into a list, then seek the tape to that position and start loading – or you could write them as a progam listing at the beginning of the tape and load it first to see where each program was stored, but this required rewinding the tape each time you wanted to see the list again.
On disks and hard drives a common practice was to store a file allocation table (FAT) at the beginning of the drive. The operating system would then know where to find the list of files in the single extremely long byte array of storage. Many iterations of how that table was stored, and whether it was split into smaller sub-tables that might or might not have been adjacent to the main table, ensued in the following decades. And instead of storing only the name and starting position, more information was gradually added to the mix: file length in bytes, creation date, latest modification date, user ownership, extended attributes, and so on.
One very essential piece of information was added quite early on but to most consumer-facing file systems as late as in the 80s: directory structures. A file could be contained in a container, whether it was called a directory, a folder or a drawer.
Modern computing
In 2025, we’re still at that point. Where did the past 40 years go?
A few things did happen:
- File storages grew, from being able to contain megabytes, to petabytes of data. At the same time, file systems became able to support file offsets of up to exbibytes (1 exbibyte = 1,152,921,504,606,846,976 bytes or 1024^6 bytes).
- Internet. This did absolutely nothing for files.
- Cloud computing and cloud storage. This added an abstraction layer on top of existing file systems, but the concepts mostly remained. More about why mostly below.
Chances are you are using a computer or a device that has an internal file browser that lets you go through the contents of your storage(s) and see most of the attributes mentioned abow: file names, sizes and modification dates. On more advanced operating systems you may also have tags, labels or other extended attributes on the files and see visual representation of said attributes in the file browser.
Your operating system may also build a searchable index of the file system contents. In some cases this index may even include not only the file names but the contents of the files themselves.
But they’re all still files, they have a name and size, and they point at a location in storage, and the way to get to that file is by knowing which folder it is in and what the exact name is.
It might seem like 40 years of no change falls under the “it ain’t broke, don’t fix it” principle. But consider these scenarios.
Problem 1: File not found
In the 80s, you could find a file.
Even if you had to look through 5 floppy disks for whatever’s hand-written on them, or even feed them into a drive one by one, within a few minutes you would have located your file.
Today, let’s say you created something on a whim a month ago, then had to restart your computer and had 20 other unsaved files open. And in that moment you simply hit “Save” and gave rushed filenames like “new”, “new-1”, “other”, “something” and “asvzdjnkb” to avoid overwriting anything, since your OS just told you it will restart in 15 seconds without any regard to your feelings or unsaved work. Chances are that now, month later, you discover that those files, from your point of view, are simply gone forever.
Problem 2: Unknown file
Another problem is the lack of standardization. This is something humanity had before moving from paper to computers.
There was a range of standard file formats – Letter, Legal, A4, A5 and so on. Files were all on rectangular pieces of paper. If the information didn’t fit on one paper, there was a standard visual method of sorting them in a linear array, page numbers. Additional attributes could also be attached using a little metallic paper clip. Anyone who grew up in a post-industrial revolution society knew how to decipher this file format.
This beautiful uniformity has been replaced with a mix of some “usual” file extensions, a whole lot of obscure, less-known ones and only a few that can contain structured data in a standardized way and still be universally readable.
Problem 3: Too many files
There’s just too many of them! As an example, I did a quick find .
in my home folder and stopped it after five seconds. In that time, the list got 250,000 items long. I’m in no way insinuating that the information inside these files would be unnecessary, on the contrary, the operating system and many of the applications rely on finding these files in their respective locations. However, they should no longer be considered anything human-facing.
We passed the line of too many files somewhere between the floppy disk and CD-ROMs. A single CD-ROM could theoretically hold up to hundreds of thousands of files*, but more practically you’d still be looking at thousands, if not tens of thousands of individual files.
* Even for one-byte files, the minimum allocated size on a CD-ROM was at around 2 kilobytes which meant the theoretical maximum number of files for a 700 MB disc would land at 358,400.
End of File [EOF]
Can we finally get rid of the filing problem?
There have been attempts at this. One example: when Apple originally released the iPhone, it did not have a file system browser. Instead, after a few major OS updates, applications could start sending files and references to files to each other, while each one would rely on a custom element to display them in an organized fashion.
This introduced a lot of data duplication, and after a fair amount of user feedback, the Files app was introduced, which would expose the cloud containers the different applications would use and the root iCloud folder.
However, this did make smartphone users a lot more content-oriented and a lot less caring about where the information was stored, as long as they had access to it using an easy interface.
The new generation of people using computers grew up after the file and disparity explosion, but also with smartphones in hand, so the mindset is already there, ready to let go of the endless file names, extensions and folders. It is the perfect time to move on to something new.
I’m personally hoping this is something a personally and locally-trained machine learning model could step in with. The hundreds of thousands of pieces of information; what would be better to sift through them than a machine? This would get us closer to an idealistic view that’s been so many times presented in science fiction: you simply ask for something, and there it is.
Imagine telling your device “open the raccoon thing I was drawing the other night”, or “give me the proceeds spreadsheet of the last quarter”.
Leave a Reply