The Hell of Calculating The Size of Strings in PHP
If you think that there’s one function to rule them all, you should read this!
For some reasons, there are many ways to count characters in a string in PHP. It seems to be such a basic task, and having (at least) four days of doing so is curious. More than that, it could confuse a PHP neophyte. How to choose the best way to count characters? Why is there not only one way to do it and call it a day?
Before anything else, let’s talk a bit about IT history (super quickly I promise). Each existing character has its own “computer encoding”. At first, only English characters, digits and a few special symbols were useful in IT and the first big family of characters, named ASCII, was enough. It consists of 255 different characters. One character would be encoded on 1 byte of memory. Simple.
Then IT and computer became wildly popular and of course these 255 characters are not enough. Everybody want its entire alphabet to be available. As you can imagine, 255 is not enough space for all characters we know : Chinese, Japanese, French, Hindi, Arab alphabets and all other existing ones represent thousands, dozens of thousands of different symbols. We had to find a solution, which is called UTF. To simplify, it has been decided that characters could be encoded with more…