Class EncodingSniffer

java.lang.Object
org.htmlunit.util.EncodingSniffer

public final class EncodingSniffer extends Object
Sniffs encoding settings from HTML, XML or other content. The HTML encoding sniffing algorithm is based on the HTML5 encoding sniffing algorithm.
  • Method Details

    • sniffEncoding

      @Deprecated public static Charset sniffEncoding(List<NameValuePair> headers, InputStream content) throws IOException

      If the specified content is HTML content, this method sniffs encoding settings from the specified HTML content and/or the corresponding HTTP headers based on the HTML5 encoding sniffing algorithm.

      If the specified content is XML content, this method sniffs encoding settings from the specified XML content and/or the corresponding HTTP headers using a custom algorithm.

      Otherwise, this method sniffs encoding settings from the specified content of unknown type by looking for Content-Type information in the HTTP headers and Byte Order Mark information in the content.

      Note that if an encoding is found but it is not supported on the current platform, this method returns null, as if no encoding had been found.

      Parameters:
      headers - the HTTP response headers sent back with the content to be sniffed
      content - the content to be sniffed
      Returns:
      the encoding sniffed from the specified content and/or the corresponding HTTP headers, or null if the encoding could not be determined
      Throws:
      IOException - if an IO error occurs
    • sniffHtmlEncoding

      @Deprecated public static Charset sniffHtmlEncoding(List<NameValuePair> headers, InputStream content) throws IOException

      Sniffs encoding settings from the specified HTML content and/or the corresponding HTTP headers based on the HTML5 encoding sniffing algorithm.

      Note that if an encoding is found but it is not supported on the current platform, this method returns null, as if no encoding had been found.

      Parameters:
      headers - the HTTP response headers sent back with the HTML content to be sniffed
      content - the HTML content to be sniffed
      Returns:
      the encoding sniffed from the specified HTML content and/or the corresponding HTTP headers, or null if the encoding could not be determined
      Throws:
      IOException - if an IO error occurs
    • sniffXmlEncoding

      @Deprecated public static Charset sniffXmlEncoding(List<NameValuePair> headers, InputStream content) throws IOException

      Sniffs encoding settings from the specified XML content and/or the corresponding HTTP headers using a custom algorithm.

      Note that if an encoding is found but it is not supported on the current platform, this method returns null, as if no encoding had been found.

      Parameters:
      headers - the HTTP response headers sent back with the XML content to be sniffed
      content - the XML content to be sniffed
      Returns:
      the encoding sniffed from the specified XML content and/or the corresponding HTTP headers, or null if the encoding could not be determined
      Throws:
      IOException - if an IO error occurs
    • sniffUnknownContentTypeEncoding

      @Deprecated public static Charset sniffUnknownContentTypeEncoding(List<NameValuePair> headers, InputStream content) throws IOException

      Sniffs encoding settings from the specified content of unknown type by looking for Content-Type information in the HTTP headers and Byte Order Mark information in the content.

      Note that if an encoding is found but it is not supported on the current platform, this method returns null, as if no encoding had been found.

      Parameters:
      headers - the HTTP response headers sent back with the content to be sniffed
      content - the content to be sniffed
      Returns:
      the encoding sniffed from the specified content and/or the corresponding HTTP headers, or null if the encoding could not be determined
      Throws:
      IOException - if an IO error occurs
    • sniffEncodingFromHttpHeaders

      @Deprecated public static Charset sniffEncodingFromHttpHeaders(List<NameValuePair> headers)
      Deprecated.
      as of version 4.0.0; method will be removed without replacement
      Attempts to sniff an encoding from the specified HTTP headers.
      Parameters:
      headers - the HTTP headers to examine
      Returns:
      the encoding sniffed from the specified HTTP headers, or null if the encoding could not be determined
    • sniffEncodingFromMetaTag

      public static Charset sniffEncodingFromMetaTag(InputStream is) throws IOException
      Attempts to sniff an encoding from an HTML meta tag in the specified byte array.
      Parameters:
      is - the content stream to check for an HTML meta tag
      Returns:
      the encoding sniffed from the specified bytes, or null if the encoding could not be determined
      Throws:
      IOException - if an IO error occurs
    • extractEncodingFromContentType

      public static Charset extractEncodingFromContentType(String s)
      Extracts an encoding from the specified Content-Type value using the IETF algorithm; if no encoding is found, this method returns null.
      Parameters:
      s - the Content-Type value to search for an encoding
      Returns:
      the encoding found in the specified Content-Type value, or null if no encoding was found
    • sniffEncodingFromXmlDeclaration

      public static Charset sniffEncodingFromXmlDeclaration(InputStream is) throws IOException
      Searches the specified XML content for an XML declaration and returns the encoding if found, otherwise returns null.
      Parameters:
      is - the content stream to check for the charset declaration
      Returns:
      the encoding of the specified XML content, or null if it could not be determined
      Throws:
      IOException - if an IO error occurs
    • sniffEncodingFromCssDeclaration

      public static Charset sniffEncodingFromCssDeclaration(InputStream is) throws IOException
      Parses and returns the charset declaration at the start of a css file if any, otherwise returns null.

      e.g.

      @charset "UTF-8"
      Parameters:
      is - the input stream to parse
      Returns:
      the charset declaration at the start of a css file if any, otherwise returns null.
      Throws:
      IOException - if an IO error occurs
    • toCharset

      public static Charset toCharset(String charsetName)
      Returns Charset if the specified charset name is supported on this platform.
      Parameters:
      charsetName - the charset name to check
      Returns:
      Charset if the specified charset name is supported on this platform
    • translateEncodingLabel

      @Deprecated public static String translateEncodingLabel(Charset encodingLabel)
      Deprecated.
      as of version 4.0.0; method will be removed without replacement
      Translates the given encoding label into a normalized form according to Reference.
      Parameters:
      encodingLabel - the label to translate
      Returns:
      the normalized encoding name or null if not found
    • translateEncodingLabel

      public static String translateEncodingLabel(String encodingLabel)
      Translates the given encoding label into a normalized form according to Reference.
      Parameters:
      encodingLabel - the label to translate
      Returns:
      the normalized encoding name or null if not found