Blog

Swift Tip: Decomposing Emoji

Now that emoji are common everywhere, we need to be aware of unicode, even without an international userbase. For example, the emoji πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ (a family of four glyph) has a very different length across String implementations:

								"πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦".count // 1
("πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦" as NSString).length // 11

							

Javascript also evaluates to 11. In Ruby "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦".length evaluates to 7, and in Python 2 len("πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦") evaluates to 25 (depending on your settings). One string, four different lengths.

Perhaps even more surprising: none of these implementations are wrong. They're all counting different things. In Swift, we get 1 as the answer because Swift counts the characters -- πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ is a single character. The NSString variant and Javascript evaluate to 11 because they're counting the number of UTF-16 code units. We can replicate this in Swift:

								"πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦".utf16.count // 11

							

We can also see how Python gets to 25 -- in this case, it counts the UTF-8 code units:

								"πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦".utf8.count // 25

							

And finally, Ruby and Python 3 evalute to 7 because they count the unicode scalars, and πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ consists of the following scalars: πŸ‘¨ + zero width joiner + πŸ‘© + zero width joiner + πŸ‘§ + zero width joiner + πŸ‘¦.

								"πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦".unicodeScalars.count // 7

							

When you're dealing with strings where length is significant, keep this in mind. To learn more, watch last week's Swift Talk episode or read the transcript . If you'd like to learn more about unicode and how it's implemented in Swift, read our book Advanced Swift .

Stay up-to-date with our newsletter or follow us on Twitter .

Back to the Blog

Recent Posts