JavaScript strings are Unicode text, but much of the web still runs on bytes: network responses, files, streams, and binary protocols. The Encoding API exists to translate between these two representations so you can reliably handle text at the byte level.
At a high level, the API supports two complementary operations:
- Encoding: converting a JavaScript string into a sequence of bytes representing the string in UTF‑8.
- Decoding: converting a sequence (or stream) of bytes in some character encoding into a JavaScript string.
These operations are intentionally asymmetrical: encoding targets UTF‑8, while decoding can handle UTF‑8 plus many legacy encodings.
Core interfaces
The Encoding API exposes four main building blocks:
- TextEncoder: synchronously encodes a JavaScript string into a Uint8Array of UTF‑8 bytes.
- TextDecoder: synchronously decodes a byte array into a string using a specified encoding (UTF‑8 by default, with support for many legacy encodings).
- TextEncoderStream: encodes a stream of strings into a stream of UTF‑8 bytes.
- TextDecoderStream: decodes a stream of bytes into a stream of strings.
Most everyday tasks use TextEncoder and TextDecoder. The stream variants are useful when text arrives incrementally (for example, over a network connection) or when you want to process large data without buffering it all at once.
Why UTF‑8 and bytes matter
UTF‑8 is the dominant encoding for web documents and APIs, but it’s important to remember that UTF‑8 is a byte-oriented encoding. Characters don’t always map 1:1 to bytes:
- Many common ASCII characters are 1 byte.
- Accented characters may take multiple bytes.
- Emojis and some symbols often take even more bytes.
This is why “string length” and “byte length” are different concepts—critical when you’re dealing with file sizes, protocol limits, cryptographic signatures, or binary formats.
Decoding beyond UTF‑8: legacy encodings
A major benefit of the Encoding API is that decoding is not limited to UTF‑8. The decoder can interpret byte sequences produced by older systems (for example, Windows-1252 and other legacy encodings). This is especially useful when you:
- ingest older documents or exports,
- consume data from legacy servers,
- process user-provided files with unknown encoding,
- integrate with systems that haven’t standardized on UTF‑8.
Error handling: strict vs forgiving decoding
Decoding isn’t always clean: you may encounter invalid byte sequences. The Encoding API lets you choose how strict you want to be:
- Non-fatal decoding (the default) replaces invalid sequences with a replacement character, allowing you to continue.
- Fatal decoding throws an error on invalid input, which is useful when correctness matters (for example, parsing structured text where corruption should stop processing).
When to use stream-based encoders/decoders
The stream-based interfaces (TextEncoderStream / TextDecoderStream) are designed for scenarios like:
- decoding text in chunks as it arrives,
- building pipelines that transform data progressively,
- handling very large responses efficiently.
They pair naturally with the Web Streams API and can make text processing both cleaner and more memory-friendly.
Encoding API Demo
Web Encoding API Demo
Open your browser's developer console (F12) to see the output of the JavaScript demo.
/* Basic CSS for a simple demo page */
/* Universal box-sizing for easier layout calculations */
*,
*::before,
*::after {
box-sizing: border-box;
}
/* Body styling for basic typography and layout */
body {
font-family: Arial, sans-serif;
line-height: 1.6;
margin: 20px;
background-color: #f4f4f4;
color: #333;
}
/* Headings */
h1,
h2,
h3 {
color: #0056b3;
margin-bottom: 10px;
}
h1 {
border-bottom: 2px solid #0056b3;
padding-bottom: 10px;
margin-top: 0;
}
/* Paragraphs */
p {
margin-bottom: 10px;
}
/* Links */
a {
color: #007bff;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
/* Code blocks or inline code */
code,
pre {
font-family: "Courier New", Courier, monospace;
background-color: #eee;
padding: 2px 4px;
border-radius: 4px;
}
pre {
display: block;
padding: 10px;
border: 1px solid #ddd;
overflow-x: auto;
white-space: pre-wrap;
word-wrap: break-word;
}
/* Example for a container or specific demo element */
.container {
max-width: 960px;
margin: 0 auto;
padding: 20px;
background-color: #fff;
border-radius: 8px;
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}
/* Styling for the debug output area from your Encoding API demo */
.debug-output {
background-color: #e9ecef;
border: 1px dashed #ced4da;
padding: 15px;
margin-top: 20px;
border-radius: 5px;
}
.debug-output strong {
color: #495057;
}
/* Styling for a button, if you add one */
button {
background-color: #28a745;
color: white;
padding: 10px 15px;
border: none;
border-radius: 5px;
cursor: pointer;
font-size: 16px;
margin-top: 15px;
}
button:hover {
background-color: #218838;
}
/**
* @file EncodingAPIDemo.js
* @brief This script demonstrates the basic usage of the Web Encoding API,
* specifically TextEncoder and TextDecoder, for converting between
* JavaScript strings and byte arrays (Uint8Array).
*
* The Encoding API allows web applications to handle character encodings beyond
* what JavaScript natively supports for string manipulation, primarily focusing
* on UTF-8 for encoding and a wider range of encodings for decoding.
*
* Reference: https://developer.mozilla.org/en-US/docs/Web/API/Encoding_API
*/
// --- 1. TextEncoder: Encoding a JavaScript string into UTF-8 bytes ---
console.log('--- TextEncoder Demo ---');
// Create a new TextEncoder instance.
// By default, it encodes to 'utf-8'. You can specify other encodings,
// but TextEncoder currently only supports 'utf-8'.
const encoder = new TextEncoder();
// Define a sample string, including some non-ASCII characters to demonstrate UTF-8 handling.
const textToEncode =
'Hello, world! 👋 This is a test with a special character: é.';
console.log(`Original String: "${textToEncode}"`);
// Encode the string into a Uint8Array (an array of 8-bit unsigned integers, i.e., bytes).
const encodedBytes = encoder.encode(textToEncode);
console.log('Encoded Bytes (Uint8Array):', encodedBytes);
console.log('Encoded Bytes Length:', encodedBytes.length);
// Note: The length of the encoded bytes might be different from the string length
// due to multi-byte characters in UTF-8 (e.g., '👋' uses 4 bytes, 'é' uses 2 bytes).
// To view the hexadecimal representation (useful for debugging byte streams).
const hexString = Array.from(encodedBytes)
.map((byte) => byte.toString(16).padStart(2, '0'))
.join(' ');
console.log('Encoded Bytes (Hex):', hexString);
// --- 2. TextDecoder: Decoding a byte array back into a JavaScript string ---
console.log('\n--- TextDecoder Demo ---');
// Create a new TextDecoder instance.
// We need to specify the encoding of the bytes we are decoding.
// In this case, we know `encodedBytes` is 'utf-8'.
const decoder = new TextDecoder('utf-8');
// Decode the `Uint8Array` back into a JavaScript string.
const decodedText = decoder.decode(encodedBytes);
console.log(`Decoded String (UTF-8): "${decodedText}"`);
// Verify that the decoded string matches the original.
console.log(
'Does decoded string match original?',
decodedText === textToEncode
);
// --- 3. TextDecoder with a different (legacy) encoding ---
// This demonstrates TextDecoder's ability to handle various encodings,
// though TextEncoder only produces UTF-8.
console.log('\n--- TextDecoder with Legacy Encoding Demo ---');
// Imagine you received a byte array encoded in 'windows-1252' (a common legacy encoding).
// For demonstration, let's manually create some bytes that represent
// "Hello, world! ©" in windows-1252. The copyright symbol '©' is byte `0xA9` in windows-1252.
// In UTF-8, '©' is `0xC2 0xA9`.
const windows1252Bytes = new Uint8Array([
72,
101,
108,
108,
111,
44,
32,
119,
111,
114,
108,
100,
33,
32,
169 // 169 is 0xA9 in hex
]);
console.log('Windows-1252 Encoded Bytes:', windows1252Bytes);
// Create a decoder for 'windows-1252'.
const windows1252Decoder = new TextDecoder('windows-1252');
// Decode the bytes.
const windows1252DecodedText = windows1252Decoder.decode(windows1252Bytes);
console.log(`Decoded String (windows-1252): "${windows1252DecodedText}"`);
// What happens if we try to decode windows-1252 bytes with a UTF-8 decoder?
// It will likely result in a "replacement character" () for bytes that are not valid UTF-8 sequences.
const incorrectUtf8Decode = new TextDecoder('utf-8').decode(windows1252Bytes);
console.log(
`Incorrectly Decoded String (UTF-8 for windows-1252 bytes): "${incorrectUtf8Decode}"`
);
// You'll see the '©' replaced by '' because 0xA9 is not a valid start of a UTF-8 sequence.
// --- 4. Handling encoding errors (optional for demo, but good to know) ---
console.log('\n--- Error Handling Demo ---');
// The TextDecoder constructor accepts an options object.
// The 'fatal' option determines if decoding errors throw an exception.
// By default, 'fatal' is false, and errors result in replacement characters ().
const fatalDecoder = new TextDecoder('utf-8', { fatal: true });
const nonFatalDecoder = new TextDecoder('utf-8', { fatal: false }); // default behavior
const invalidUtf8Bytes = new Uint8Array([0xc0, 0x80]); // An invalid UTF-8 sequence
try {
fatalDecoder.decode(invalidUtf8Bytes);
} catch (error) {
console.log('Decoding with fatal: true caught an error:', error.message);
}
console.log(
'Decoding with fatal: false:',
nonFatalDecoder.decode(invalidUtf8Bytes)
);
// Output will likely include a replacement character ''
// --- 5. Stream-based encoding/decoding (brief mention for context) ---
console.log('\n--- Stream-based API (Conceptual) ---');
console.log(
'The Encoding API also provides TextEncoderStream and TextDecoderStream'
);
console.log(
'for handling large amounts of data or data arriving in chunks (e.g., network streams).'
);
console.log(
'These are typically used with Web Streams API (ReadableStream, WritableStream).'
);
console.log(
'Example use case: Piping a network response through a TextDecoderStream.'
);
// No executable code for streams here, as it requires more setup (ReadableStream, WritableStream).
- Simplicity: The API is straightforward to use, involving only creating a channel, sending messages, and closing the channel.
- Efficiency: It allows direct communication without the need to involve the server or store data locally that multiple scripts must poll continuously.
- Broad Support: It is supported in most modern browsers, making it accessible for a wide range of applications.
Takeaway
The Encoding API gives the web platform a standard, reliable way to convert between strings and encoded bytes, centered around UTF‑8 for encoding and broad support for decoding. Whether you’re working with network data, files, or streams, it helps you treat text as the byte-level data it ultimately is—without losing correctness across languages, symbols, and legacy systems.

