UTF-16 truncation?
People say that emoji is the technology that finally gets Latin-script users to really care about text encoding.
Speculation: YouTube stored the content of posts in UTF-16, and truncated the text based on UTF-16 charactcers. Emoji, and all other characters outside of the Basic Multilingual Plane, are encoded as a surrogate pair that are invalid when isolated. In case when the text is truncated in the middle of an emoji, the text becomes invalid and resulting in a �.