We want to make this open-source project available for people all around the world. Please help us to translate the content of this tutorial to the language you know

The unicode flag /.../u enables the correct support of surrogate pairs.

Surrogate pairs are explained in the chapter Strings.

Let’s briefly remind them here. In short, normally characters are encoded with 2 bytes. That gives us 65536 characters maximum. But there are more characters in the world.

So certain rare characters are encoded with 4 bytes, like 𝒳 (mathematical X) or 😄 (a smile).

Here are the unicode values to compare:

Character Unicode Bytes
a 0x0061 2
0x2248 2
𝒳 0x1d4b3 4
𝒴 0x1d4b4 4
😄 0x1f604 4

So characters like a and occupy 2 bytes, and those rare ones take 4.

The unicode is made in such a way that the 4-byte characters only have a meaning as a whole.

In the past JavaScript did not know about that, and many string methods still have problems. For instance, length thinks that here are two characters:

alert('😄'.length); // 2
alert('𝒳'.length); // 2

…But we can see that there’s only one, right? The point is that length treats 4 bytes as two 2-byte characters. That’s incorrect, because they must be considered only together (so-called “surrogate pair”).

Normally, regular expressions also treat “long characters” as two 2-byte ones.

That leads to odd results, for instance let’s try to find [𝒳𝒴] in the string 𝒳:

alert( '𝒳'.match(/[𝒳𝒴]/) ); // odd result

The result would be wrong, because by default the regexp engine does not understand surrogate pairs. It thinks that [𝒳𝒴] are not two, but four characters: the left half of 𝒳 (1), the right half of 𝒳 (2), the left half of 𝒴 (3), the right half of 𝒴 (4).

So it finds the left half of 𝒳 in the string 𝒳, not the whole symbol.

In other words, the search works like '12'.match(/[1234]/) – the 1 is returned (left half of 𝒳).

The /.../u flag fixes that. It enables surrogate pairs in the regexp engine, so the result is correct:

alert( '𝒳'.match(/[𝒳𝒴]/u) ); // 𝒳

There’s an error that may happen if we forget the flag:

'𝒳'.match(/[𝒳-𝒴]/); // SyntaxError: invalid range in character class

Here the regexp [𝒳-𝒴] is treated as [12-34] (where 2 is the right part of 𝒳 and 3 is the left part of 𝒴), and the range between two halves 2 and 3 is unacceptable.

Using the flag would make it work right:

alert( '𝒴'.match(/[𝒳-𝒵]/u) ); // 𝒴

To finalize, let’s note that if we do not deal with surrogate pairs, then the flag does nothing for us. But in the modern world we often meet them.

Tutorial map

江苏快三基本走势图带连线_江苏快3走势图

read this before commenting…
  • You're welcome to post additions, questions to the articles and answers to them.
  • To insert a few words of code, use the <code> tag, for several lines – use <pre>, for more than 10 lines – use a sandbox (plnkr, JSBin, codepen…)
  • If you can't understand something in the article – please elaborate.