UTF8 string literals in C#11

How string literals are easily translated into bytes with C#11

Home DailyDrop

Daily Knowledge Drop

Coming with C# 11 (being released later this year, coinciding with the .NET 7 release) the conversion from a string literal to a byte[] is becoming easier, faster, and more efficient.

The byte[] is often used in dealing with streams (for example) and with the current, and prior C# versions, the conversation from a string to byte[] required as explicit conversion. However with C#11, this conversion is simplified, but also gains a large performance boost.


C# 10 and prior

In the current (and prior) versions of C#, when a string literal is required to be converted to a byte[], the System.Text.Encoding.X.GetBytes method is used (where X is the encoding method, UTF8 specifically in this post):

byte[] bytes = System.Text.Encoding.UTF8.GetBytes("alwaysdeveloping.net");

using var stream = new MemoryStream();
stream.Write(bytes);

While not especially complicated, this does involve an explicit method call to perform the conversion.


C# 11

With C#11, it's possible to do this with an implicit conversion:

ReadOnlySpan<byte> spanBytes = "alwaysdeveloping.net"u8;

using var stream = new MemoryStream();
stream.Write(spanBytes);

Although a ReadOnlySpan can be used whereever a byte[] is required, if a byte[] is specifically needed:

ReadOnlySpan<byte> spanBytes = "alwaysdeveloping.net"u8;
byte[] bytes = spanBytes.ToArray();

using var stream = new MemoryStream();
stream.Write(bytes);

The u8 suffix on the string, indicates to the compiler that it should convert the string value into an array of bytes - or more specifically in this case, a ReadOnlySpan of bytes. Using a ReadOnlySpan is more efficient and uses no additional memory - but if a byte[] is specifically required, the ToArray method can be leveraged to get a byte[] from the ReadOnlySpan.


Performance

Below are a couple of simple benchmarks run to compare the performance and memory usage of the old and new methods:


[Benchmark(Baseline = true)]
public void GetBytes()
{
    byte[] bytes = System.Text.Encoding.UTF8.GetBytes("alwaysdeveloping.net");
}

[Benchmark]
public void StringLiteral()
{
    ReadOnlySpan<byte> spanBytes = "alwaysdeveloping.net"u8;
}

Method Mean Error StdDev Median Ratio Gen 0 Allocated
GetBytes 19.5843 ns 0.4163 ns 0.6956 ns 19.6017 ns 1.000 0.0076 48 B
StringLiteral 0.0198 ns 0.0209 ns 0.0241 ns 0.0085 ns 0.001 - -

As one can see, the new method is exponentially faster and requires zero additional memory when compared with the current method.


Extend features

In the initial announcement and previews of this feature, the implicit conversion was done without specifying the u8:

byte[] array = "hello";  
Span<byte> span = "dog"; 
ReadOnlySpan<byte> span = "cat"; 

However, in subsequent previews, the u8 was added to specifically indicate that the string literal should be converted to UTF8. Hopefully in future C# language updates, more encoding methods are added, to at least bring this feature on par with using System.Text.Encoding.X.GetBytes.


Notes

A relatively small update on the surface, but if your application makes heavy use of string literals and encoding, converting to this new feature should gain you a performance boost.


References

Literals - Ignore everything you have seen so far
C# 11 Preview Updates – Raw string literals, UTF-8 and more!

Daily Drop 163: 19-09-2022

At the start of 2022 I set myself the goal of learning one new coding related piece of knowledge a day.
It could be anything - some.NET / C# functionality I wasn't aware of, a design practice, a cool new coding technique, or just something I find interesting. It could be something I knew at one point but had forgotten, or something completely new, which I may or may never actually use.

The Daily Drop is a record of these pieces of knowledge - writing about and summarizing them helps re-enforce the information for myself, as well as potentially helps others learn something new as well.
c# .net C11 string literals stringliterals