Attack of the Case Sensitive Filesystem

Case sensitive file systems should be retired.

They allow us to do stupid things, mostly in error, but sometimes because we are too smart for our own good. There is no good reason for their (continued) existence.

I imagine that in the ancient days of computing it came about something like:

“Long file names will take up too much space”

“I know, we can make the file system case sensitive.”

“Great, now instead of ‘work-notes.txt’ and ‘notes-about-homebuilding.txt’, I’ll just name the files ‘notes’ and ‘Notes”.

Software developers are known for our passion for backward compatibility, and so today the popular file systems of Linux and Unix (except Mac) are still case sensitive.

Gross!

What has gotten me worked up about this (again)?

The cause this time is a customer checked into subversion two copies of a file just with different case. From the commit log, it seems likely that they meant to rename the file.

When I tried to update (or check out) the repository on Mac OS X (10.5.6), a case preserving, but case-insensitive file system (doing the right thing), it fails with a cryptic message:

svn: In directory 'images/author_header'
svn: Can't copy 'images/author_header/.svn/tmp/text-base/belief.jpg.svn-base' to 'images/author_header/.svn/tmp/belief.jpg.tmp.tmp': No such file or directory

using the pre-packaged svn 1.4.4 (r25188 – built Nov 25 2007). Out of interest, I used macports to upgrade to 1.5.5 (r34862) and the error is different but equally cryptic:

svn: In directory 'images/author_header'
svn: Can't open file 'images/author_header/.svn/tmp/text-base/belief.jpg.svn-base': No such file or directory

At this point, I had not identified the cause of the problem, so I was quite frustrated. Thankfully, I had a Linux box to check out the repository on and from there the issue looked obvious.

Solutions?

Save us from our selves.

When working on developing the next great file system, make it so amazing that you can slip in case preserving, case insensitivity like Mac OS X’s HFS+ (Mac OS Extended).

When developing tools like the next great revision control system don’t allow files of the same name but different case — the default configuration anyway.  Failing that, have good error messages. 1

It’s not too late for Subversion either to fix the long rotten issue 667: handle file name case sensitivity edge cases (issue 2010: case sensitivity problem with checkout )

Update April 2, 2011: two years later, newer version of Mac OS X, and SVN, and I encounter the same problem:

$ svn up
svn: In directory 'macleans3/images/maps'
svn: Can't open file 'macleans3/images/maps/.svn/tmp/textbase/BQ_old.png.svn-base': No such file or directory

lloyd-imac:macleans3 lloyd$ svn cleanup
svn: In directory 'images/maps'
svn: Error processing command 'modify-wcprop' in 'images/maps'
svn: 'images/maps/bq_old.png' is not under version control

Again, after looking around for easy solutions on the Mac, I just logged into a Linux box, and checked out there, removed the duplicate file names, and committed.

  1. If developing a programming language don’t support different cased variable names at all. []

18 thoughts on “Attack of the Case Sensitive Filesystem

  1. The problem with going fully case-insensitive is that you then run foul of different rules for different locales.

    There is not one single rule for converting a character from lower-case to upper-case that works across all Latin alphabets let alone finding ones which work for all the other script types.

    The real issue is that Operating System vendors have approached the issue from a Latin alphabet perspective and have something which works for 80% of the use-cases. When it fails, it fails bad.

    It also leads in to other portability issues where portable code written on a system with a case-insensitive filing system will nearly always need some rework to getting it compiling in a case-sensitive system as case-insensitive bugs will have crept in to the code.

    What we really need is everyone to just go Unicode and case-sensitive, it would make the whole thing much simpler.

    Case-sensitivity in old unix filing systems is not about avoiding long file names its about making dealing with file names really simple.

  2. westi, awesome insights!

    Though it seems that issue of Latin alphabet collisions must already be solved on Mac and Windows?

    simple: yeah, I could not find anything to confirm that any flavor of Unix ever limited the length of file names (like DOS, Windows, Mac), but I thought I would try it on.

    Still I strongly feel case sensitively is too smart for its own good.

    • A bit late to the party, I just wanted to inform you that many old Unix operating systems used to limit file length to 14 (System V, most notably).

  3. *sigh*

    I ran into this on wp.com. I wanted to upload some photos. For whatever reason, the file extension on them was .JPEG. Of course, the uploader looks for .jpeg. JPEG!=jpeg in Linux (ext3 on this machine). I had to go and rename those couple files.

    All in good fun though. I wouldn’t go back to Windows at this point in time. OSX, now there’s something I’d love to try. Last time I used a Mac, it was a blueberry G3 tower that ran OS8 (I think because it’s been so long).

  4. Kathryn, that is very interesting. I tried to reproduce it on Mac and have not been able to so far. I’m a little surprised that Linux would be different. Is that a recent problem?

  5. It was quite a while ago that I ran into the problem. I think my old digital camera saves stuff as JPEG but my new one uses jpeg.

    I’ll see if I can reproduce it. Maybe it’s something that got fixed…

  6. Kathryn, if you only see it on Linux, I would not bother worrying about it because Linux does not have a great Flash implementation, so might be caused be that.

  7. It actually seems as though the issue is that your file system is case-insensitive. If it was case-sensitive, like most Unix and Linux file systems are, you would have had no issues at all. That’s why it worked on Linux. What you really should complain about it that many Mac applications won’t work on case-sensitive file systems and that Windows is generally case-insensitive. If everyone went case-sensitive, the world would be better.

    • Brian, case sensitivity in file systems or programming is a usability nightmare, inviting “user error” at every turn.

  8. Just about the only valid use for case-sensitivity on a computer is for password tests (if they’re plaintext). There should probably never be another instance of the use of stricmp or strcmpi in any part of the OS or any application.

  9. Case preservation is good and we do that all the time. Case sensitivity is a nightmare especially for file systems but also in programming. Do you think having ThisFile Thisfile and thisFile in the same folder has any benefits ? Of course it will just cause you to wonder which is which. Humans made lower/upper case chars for aesthetics, making the text look nice with Names and regular words. Software should preserve case but respect the meaning not the case but that could have more to do with search tools than with the filesystems, searching a case insensitive filesystem is just a bit slower but the benefits are too important. Also in programming having a function ThisFunction and a variable thisVariable is not really that usesful and produces more confusion in the end ignoring why lower and upper chars were created. Because of case sensitivity many people started writing everything in lower-case and that’s just not right.

  10. I’ve been a Unix Sys admin for 10+ years and I haven’t come across a good argument FOR case sensitivity yet other than “it’s faster to sort case sensitive”. It’s a pain the the behind and another nail in the Linux/Unix Gui user experience (besides inconsistent command line switches -v -V –version –list -l –help -help -h) ;-)

  11. Pingback: Reveal Files on Mac OS X | A Fool’s Wisdom

  12. I recently reinstalled my mac with case sensitive. I did this because I needed to have a subversion working directory of files from a case sensitive repository on my local system. Reinstalling your OS is a good idea to do once in a while anyway, get’s rid of the junk. But I have found that I can’t install my Adobe CS3, a package I paid almost $1,000 for. I recently bought a printer. The drivers will not install because of the case insensitivity. At least Adobe tells you up front during the installation the reason why. The printer installation took a significant amount of digging to find the problem. It doesn’t matter to me either way. I just wish the computer world would make up it’s mind about a great many things.

  13. In my experience as a coder I quite like case sensitivity, since ‘The’ and ‘the’ is not represented the same, they look different therefore are different, having a lowercase at the start of a sentence is incorrect while having an uppercase letter is correct, if they were the same thing then both should be correct or both should be wrong. I guess the fundamental problem is that as humans we generally have a ‘looser’ attitude to equality, however a machine runs on things not being the same if there is any difference. I would also like months to be consistent and be done with daylight savings while we are at it, although we would have to change our measures of time to achieve this … dreams are free:)

  14. I feel the need to comment on your post as it is one of the top results on google (so lots of people will read it) and you really present case sensitive file systems as an “evil” thing and case preserving as the solution, which I don’t believe is the truth. You based your opinion on the fact that for humans the names “file”, “File” and “FiLe” all mean the same thing, so a file system should threat them as the same, to help humans avoid being confused. I have the following four objections:

    The first one if from a technical point of view. Computers are build in layers for a good reason. Each layer is responsible for implementing a specific functionality and for providing it in an abstract way to the layers above it. The user of a computer (human) does not use directly the file system, but uses higher layers, which by their turn use the file system. The main reason of existence of the file system is to organize the data in your hard disk and provide the way to access them. Anything that might compromise that should not be implemented there. From the machine point of view lower case and upper case letters have different binary representations, so “F” is as different to “f” as it is to “A”. Trying to implement some language specific logic in this layer is beyond its functionality and will only lead to future problems (for example some applications designed on case insensitive file systems failing in case sensitive ones and vice versa). If an operating system wants to provide case insensitive file names it should do it in a higher level. This is a similar thing with the file extensions. To me it sounds as bad idea as having a file system that will not allow saving the file “text.png” if it does not actually contain an image.

    Second, I must say that by getting rid of case sensitivity we do not really fix the problem of confusing names (which was the reason to introduce it in the first place). It is true that for humans “File One” and “file one” means exactly the same, but this is also true for the “FileOne” and for the “File 1″. A system which would really prevent people from giving confusing names to files (meaning that later will be able to distinguish them) should also take into consideration all these cases. otherwise it is incomplete and it is not implementing its purpose. Actually implementing a layer for doing that is very complicated, as there are not only latin characters. For example, (for humans) the latin capital “A” is the same like greek capital “?” (alpha), but lower case ones are different (latin is “a” and greek is “?”). How is a (complete) case insensitive file system going to treat this case?

    My third objection is about your statement that case preservation is the solution (and that it is a good thing). This though just gives to the user a fake feeling that the case is preserved and it can lead him to make mistakes way worse than having two files with names “file” and “FILE”, like accidentally overriding his data. The computers should not assume that the user knows details about how they are implemented. I am sure that are thousands (at least) mac users which have no idea that mac has switched to us case preservation instead of a case sensitive file system (and they shouldn’t). But it can actually affect their work.

    Finally, my biggest objection is with your opinion that case sensitive file names are just confusing and it is OK to forbid them. This might be true if you have files based on the english language (like “My Photo album” and “Jokes.txt”), but when you name your files with anything else, not related with a language, case sensitivity IS very important and must be present. For example, in physics, the lower case greek letters have different meaning than the upper case ones. If I make a program that will calculate lowercase phi (?) and he upper case Phi (?) the most obvious is to save the results in the files “phi.dat” and “Phi.dat”. There is no confusion about the names and any physicist knows exactly what they are, as this is the common way to distinguish them. The fact that some users do not need (or find confusing) case sensitive file names doesn’t mean that there are no people needing them! (and they are a lots of them).

    Closing I want to note that because Apple made a move towards a specific direction, this doesn’t make by default a good move. The truth actually is that in this case (always by my opinion, based on the reasons I explained above and my personal experience) they did a huge step backwards and they nicely hided it from their users, by using case preservation. Actually I have a question, if you know. If someone had a backup from a mac (using the time machine) when it was still using case sensitive file system and he restored it on a newer mac which uses case insensitive file system, what happens if there are case conflicts? Is it going to give some warning, will it fail or it will quietly override one of the two files? I am very interested because my girlfriend did that 6 months ago and she most probably has lots of such files (she is an astronomer and they always use very bad names for their data files).

    I’m sorry for the long reply.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>