Main content
Course: AP??/College Computer Science Principles?>?Unit 1
Lesson 6: Lossless data compressionLossless file compression
Problem
Byte pair encoding is a compression algorithm that replaces the repeated pairs of characters in a string with a character that isn't in the data and creates a table of replacement mappings.
Here's the output from a byte pair encoding:
Ze mXe Zat yB rFd, Ze mXe Zings yB wiJ know
Ze mXe Zat yB lFrn, Ze mXe places yB’J go
Here's the replacements table:
original | replacement |
---|---|
th | Z |
or | X |
ou | B |
ea | F |
ll | J |
Decode the compressed string to discover the original string, a quote from Dr. Seuss:
"
e m e at y r d, e m e ings y wi know e m e at y l rn, e m e places y ’ go"