Registration has been disabled and the moderation extension has been turned off.

Contact an admin on Discord or EDF if you want an account. Also fuck bots.

Embedded files: Difference between revisions

From Encyclopedia Dramatica
Jump to navigation Jump to search
imported>Writen unclear
imported>Writen unclear
Line 34: Line 34:
[[Image:Evading_the_4chan_Lithursday_Filter.png|thumb|The old technique for posting embedded archives on 4chan.  No longer works.]]
[[Image:Evading_the_4chan_Lithursday_Filter.png|thumb|The old technique for posting embedded archives on 4chan.  No longer works.]]


Embedded 7z, RAR, and ZIP archives  are currently blocked on [[4chan]], giving posters the message "Image file contains embedded archive."  The filter was updated in November 2012 to include Ogg sound files.  Moot's statements on the block are in [http://archive.foolz.us/q/thread/304379/ this /q/ thread]The 4chan sounds script was quickly rewritten to play files in which "OggS" had been replaced with various strings; moot responded by adding "libVorbis" to the filter.
Embedded 7z, RAR, and ZIP archives  are currently blocked on [[4chan]], giving posters the message "Image file contains embedded archive."  At first moot's jpg-rar and sounds filters were particularly easy to circumvent, since he didn't scan the whole file, only the first 64 KB (later updated to 256 KB) and last 64 KBIn those days, all you needed to do to get around it was add padding after the image (for example by using several copies of the image) to push the beginning of the RAR file past the 256 KB threshold.


At first moot's jpg-rar and sounds filters were particularly easy to circumvent, since he didn't scan the whole file, only the first 64 KB (later updated to 256 KB) and last 64 KBSo all you needed to do to get around it was add padding after the image (for example by using several copies of the image) to push the beginning of the RAR file past the 256 KB threshold. See the image to the right.
The filter was updated in November 2012 to include Ogg sound files.  Moot's statements on the block are in [http://archive.foolz.us/q/thread/304379/ this /q/ thread].  The 4chan sounds script was quickly rewritten to play files in which "OggS" had been replaced with various strings; moot responded by adding "libVorbis" to the filterThis resulted in the use of yet more ways to circumvent the filter as well as new versions of [http://userscripts.org/scripts/show/154526 4chan sounds] to play the obfuscated files.


In December 2012, the filter was updated to scan the entire file, making it necessary to evade the filter in other waysSome of them include:
In December 2012, the filter was updated to scan the entire file, killing the passing method.  Since many of the strings moot scans for are only 4 bytes long, this led to numerous files failing to upload because they randomly contained a string that moot interprets as the magic number of an "embedded archive." However, it did nothing to stop the posting of embedded files, which continue to be posted using methods such as:
*Alter the magic number in the RAR file, for example by replacing "Rar!" with "Bar!".  Use a hex editor to do this so you don't make other unintentional changes to the file.
*Alter the magic number in the RAR file, for example by replacing "Rar!" with "Bar!".  Use a hex editor to do this so you don't make other unintentional changes to the file.
*Apply any number of simple transformations to the embedded data.  For example, the scripts on [http://cornarx.6te.net/mask.html this page] will scramble or unscramble any data appended after the image.
*Apply any number of simple transformations to the embedded data.  For example, the scripts on [http://cornarx.6te.net/mask.html this page] will scramble or unscramble any data appended after the image.
*Concatenate the image and file without compressing the file.  If file isn't an archive or an Ogg sound file, it most likely won't be blocked.  But if the file isn't one of the types listed above, you'll need to use a hex editor to extract it.  If the image is a JPEG file, search for FF D9 to find the end of the image data, and delete it.  Alternatively, those of you not versed in Computer Science III may want to try [http://userscripts.org/scripts/show/40343 this] Greasemonkey script, which can detect the added data in images on 4chan and split the image back up into its original pieces.  Also useful for telling fake jpeg-rar books from real ones.  Do '''not''' use this technique to upload source code or HTML files as this may trigger the anti-[[4chan.js]] filter and get you banned.
*Concatenate the image and file without compressing the file.  If file isn't an archive or an Ogg sound file, it most likely won't be blocked.  But if the file isn't one of the types listed above, you'll need to use a hex editor to extract it.  If the image is a JPEG file, search for FF D9 to find the end of the image data, and delete it.  Alternatively, those of you not versed in Computer Science III may want to try [http://userscripts.org/scripts/show/40343 this] Greasemonkey script, which can detect the added data in images on 4chan and split the image back up into its original pieces.  Also useful for telling fake jpeg-rar books from real ones.  Do '''not''' use this technique to upload source code or HTML files as this may trigger the anti-[[4chan.js]] filter and get you banned.


Or you can also try one of the other methods of embedding archives...
Or you can also try one of the many other methods of embedding files in images...


== File binders ==
== File binders ==

Revision as of 00:33, 1 January 2013

As explained by this confusing collection of boxes

An embedded file is a file that is stored or hidden inside another file, particularly inside an image which may then be posted to the *chans. For example, concatenating a JPEG file with a RAR file produces an embedded archive which can be read either as a JPEG or a RAR, depending on how it's opened.

File concatenation

An embedded MP3 file, commonly seen on /a/.

One of the most common ways of embedding files into images is simple concatenation. That is, the new file contains the data from the first file followed by the data from the second. Which file you see depends on the program you open it with.

This only works for certain combinations of file types. Many types of files will work for the first part, but it should be a GIF, JPEG, or PNG file if you want to post it to 4chan. The second file should be one of the following types:

In addition:

  • OGG sound files appended to images and posted can be played with the 4chan sounds user-script. (now blocked on 4chan)
  • Broken web pages occasionally append HTML to the end of the images they serve. In most cases, the contents are unremarkable. But several images from the diaper fetish website wetherbed.com contain the login credentials. These images are often reposted in diaper fetish threads on /b/ with the posters unaware of what's in them. You can find this information by opening the files in a text editor such as Wordpad, and searching for "password".

Examples

In Windows:

copy /B foo.jpg + bar.rar foobar.jpg

In *nix:

cat foo.jpg bar.rar > foobar.jpg

Both of these examples will create a file named foobar.jpg, that when viewed graphically is identical to foo.jpg, but when unrar'd contains the contents of bar.rar.

Why does it work?

In GIF, JPEG, and PNG files, as well as many other file types, there is information in the file that tells the program reading it how long the file is and/or where to stop. So if you put additional data after the end of the original data, most readers will ignore it.

Many types of compressed archives (7Z, RAR, ZIP) can be distributed as self-extracting files, which are composed of an executable file concatenated with the archive. So these file types are designed to be readable even if they've been appended to another file. For 7Z and RAR, the extractor searches for the "magic number" that indicates the start of the archive data. ZIP files, on the other hand, are read starting from the end of the file.

Blocked on 4chan

The old technique for posting embedded archives on 4chan. No longer works.

Embedded 7z, RAR, and ZIP archives are currently blocked on 4chan, giving posters the message "Image file contains embedded archive." At first moot's jpg-rar and sounds filters were particularly easy to circumvent, since he didn't scan the whole file, only the first 64 KB (later updated to 256 KB) and last 64 KB. In those days, all you needed to do to get around it was add padding after the image (for example by using several copies of the image) to push the beginning of the RAR file past the 256 KB threshold.

The filter was updated in November 2012 to include Ogg sound files. Moot's statements on the block are in this /q/ thread. The 4chan sounds script was quickly rewritten to play files in which "OggS" had been replaced with various strings; moot responded by adding "libVorbis" to the filter. This resulted in the use of yet more ways to circumvent the filter as well as new versions of 4chan sounds to play the obfuscated files.

In December 2012, the filter was updated to scan the entire file, killing the passing method. Since many of the strings moot scans for are only 4 bytes long, this led to numerous files failing to upload because they randomly contained a string that moot interprets as the magic number of an "embedded archive." However, it did nothing to stop the posting of embedded files, which continue to be posted using methods such as:

  • Alter the magic number in the RAR file, for example by replacing "Rar!" with "Bar!". Use a hex editor to do this so you don't make other unintentional changes to the file.
  • Apply any number of simple transformations to the embedded data. For example, the scripts on this page will scramble or unscramble any data appended after the image.
  • Concatenate the image and file without compressing the file. If file isn't an archive or an Ogg sound file, it most likely won't be blocked. But if the file isn't one of the types listed above, you'll need to use a hex editor to extract it. If the image is a JPEG file, search for FF D9 to find the end of the image data, and delete it. Alternatively, those of you not versed in Computer Science III may want to try this Greasemonkey script, which can detect the added data in images on 4chan and split the image back up into its original pieces. Also useful for telling fake jpeg-rar books from real ones. Do not use this technique to upload source code or HTML files as this may trigger the anti-4chan.js filter and get you banned.

Or you can also try one of the many other methods of embedding files in images...

File binders

This is a variation on the concatenation method. A file binder is a program that appends files and their names to images in its own particular format, and extracts the files other people add to images. They often apply simple transformations to the data to circumvent filters.

  • pFBind was created to get around 4chan's block on embedded RARs and save Lithursday, but it was eventually blocked from 4chan itself.
  • ChanGrouper (v1:[2] v2:[3]) is another file binder, written in Java. It has not yet been blocked from 4chan. The ChanGrouper websites may be down; you can alternately download ChanGrouper here (v1:[4] v2:[5]). The original source code of the program is included in the JAR file; you can examine it by downloading the file and either renaming it to .zip or opening it in your favorite archiver. This was the format used by Coupon Guy to distribute his guides on making fake coupons before he was vanned.

Metadata blocks

Files can also be embedded in the metadata blocks of images. This technique has not seen as much use since it takes more work than concatenation, and isn't significantly harder to block. But it does have the advantage of working on sites which strip off appended data.

Image Data

Cornelia format

File:Cornelian archive tools.png
A Cornelia-style archive containing tools for making more.

These archives are embedded into the image data of a 24-bit Windows bitmap, then converted it to a PNG so you can post it on 4chan. This was the format used by Cornelia to post the dox of infected users. Moot never figured out how to filter out Cornelia's posts efficiently as he had done with previous incarnations of 4chan.js, and instead gave up and added CAPTCHA to 4chan. So even if moot decides to beef up his anti-JPEGRAR filter in the near future, we should expect Cornelia's embedded file format to remain unblocked for some time. And if he does figure out how to block it, he may consider removing CAPTCHA, so it's a win-win.

In addition, Cornelia-style archives on 4chan are often smaller than JPEG-RARs due to the lack of moot-evading padding. And unlike JPEG-RARs, they can be posted on sites that strip off appended data. They are, however, easier for mods to detect by eye.

There are now userscripts which support posting archives in Cornelia format as well as extracting the files and viewing them in your browser.

To create one manually, start with an image with enough blank space at the bottom to hold the archive data. The number of pixels needed is 1/3 the length of the archive. It's also important that the image width is a multiple of 4. Then on Linux / OS X you can do:

convert inputimage.jpg ppm:- | convert ppm:- tmp1.bmp
head -c 54 tmp1.bmp > tmp2
cat inputarchive.7z >> tmp2
dd if=tmp2 of=tmp1.bmp conv=notrunc
convert tmp1.bmp outputimage.png

Other methods for creating them can be found bundled in the image to the right.

To extract the files manually:

  1. Convert the image to a 24-bit BMP file. You can do this by copying it to or opening it in an image editor, and saving it as the correct type:
    • In MSPaint: Make sure the save type is set to "24-bit Bitmap". You may have to make and undo an edit to force deletion of the alpha channel.
    • In Mac OS X's Preview: Before saving the image, flip it vertically. Choose the format "Microsoft BMP". Make sure the "Alpha" box, if present, is unchecked, and that the "Rotate without modifying contents" box is checked.
    • In The GIMP: Change the extension to ".bmp". In the next dialog, make sure "24 bits: R8 G8 B8" is selected under "Advanced Options".
  2. Open the .bmp file with 7-Zip or WinRAR.

Other formats

Archives embedded in Photoshop RAW files and converted to PNG have also been sighted on 4chan.

Steganography

An image created with the 4chan Gold File Embedder (Java source inside JAR archive), a weak but high-density steganography program. Not often used, but several "4chan Gold" images in circulation contain these embedded files due to being downloaded from the ED article. Note the noise in what should be a smooth background.

The most trivial and well-known form of steganography is to embed files in the image data, but only to use the least significant bit of each byte. For example, if the original image contained the bytes 00011000 01000001 01100001 01010010, you could embed a message (example: 1010) by changing the bits in the ones position: 00011001 01000000 01100001 01010010. Generally this change will not be detectable by eye. However, this is a fairly weak form of steganography because it can be detected through histogramming.

Google will find you all sorts of programs which claim to be steganography utilities. Some of them actually are; others are really just file binders or embedded archive makers as described above. And many of the programs that actually are steganography have serious flaws. See [6] for some details.

Among the current state-of-the art steganography algorithms are the Modified Matrix Embedding (need link to an implementation!) and Perturbed Quantization. A C implementation of the original, weaker version of PQ is available here. Some other tools you can download whose hidden files are not trivial to detect are F5, OutGuess, and steghide. Outguess was notably used in the Cicada game. Be aware that in order for the files to be hard to detect, you should embed them in photographs. If you use computer graphics, especially those containing large chunks of solid color, embedding files steganographically will create noticeable artifacts.

But even for the best steganography algorithms out there, experts are constantly searching for and finding ways of detecting the files they hide. While you can certainly hide stuff from moot, if you have files on your computer that you don't want the FBI or some other serious organization to find, you should not expect steganography to keep them hidden. Modern encryption can be counted on; steganography, not so much.

See Also