Saturday, August 9, 2008

Presentation Layer Output Encoding: Apache Commons Lang StringEscapeUtils vs. OWASP Reform

*Updated to include OWASP ESAPI Results on August 21, 2008*

In order to compare the effectiveness of the Apache Commons Lang StringEscapeUtils and the OWASP Reform library, I created a JSP page that encodes ASCII values from 0 to 255. I chose to examine the ability of each library to encode HTML and JavaScript values for the purpose of preventing cross-site scripting attacks. The results are shown at the bottom of this post.

In general, both the ESAPI and Reform libraries encode any value other than a-z, A-Z, and 0-9 (there are some exceptions). This is a great approach to ensuring client-side input cannot be interpreted as HTML or JavaScript commands when it is redisplayed in the browser.

ESAPI is under active development and boasts a variety of other security related functionality that may benefit an organization. I encourage everyone to take a look at the ESAPI OWASP project.

The rest of this post is provided as a reference.

Apache Commons Lang StringEscapeUtils Methods:
  • escapeHtml
  • escapeJava
  • escapeJavaScript
  • escapeSql
  • escapeXml
  • unescapeHtml
  • unescapeJava
  • unescapeJavaScript
  • unescapeXml

OWASP Reform Methods:
  • HtmlEncode
  • HtmlAttributeEncode
  • XmlEncode
  • XmlAttributeEncode
  • JsString
  • VbsString
OWASP ESAPI (SVN Snapshot 2008-08-21) Methods:
  • canonicalize
  • normalize
  • encodeForCSS
  • encodeForHTML
  • encodeForHTMLAttribute
  • encodeForJavaScript
  • encodeForVBScript
  • encodeForSQL
  • encodeForLDAP
  • encodeForDN
  • encodeForXPath
  • encodeForXML
  • encodeForXMLAttribute
  • encodeForURL
  • decodeFromURL
  • encodeForBase64
  • decodeFromBase64

Legend:
ASCII - The numerical ASCII value
Char - The symbol or character associated with that ASCII value
SEU HTML - org.apache.commons.lang.StringEscapeUtils.escapeHtml
Reform HTML - org.owasp.reform.Reform.HtmlEncode
ESAPI HTML - org.owasp.esapi.Encoder.encodeForHTML
SEU JS - org.apache.commons.lang.StringEscapeUtils.escapeJavaScript
Reform JS - org.owasp.reform.Reform.JsString
ESAPI JS - org.owasp.esapi.Encoder.encodeForJavaScript

ASCIICharSEU
HTML
Reform
HTML
ESAPI
HTML
SEU
JS
Reform
JS
ESAPI
JS
0

�
\u0000'\x00'\0
1 
\u0001'\x01'\x01
2 
\u0002'\x02'\x02
3 
\u0003'\x03'\x03
4 
\u0004'\x04'\x04
5 
\u0005'\x05'\x05
6 
\u0006'\x06'\x06
7 
\u0007'\x07'\x07
8 
\b'\x08'\b
9

	
\t'\x09'\t
10



\n'\x0a'\n
11    
\u000B'\x0b'\v
12    
\f'\x0c'\f
13


\r'\x0d'\r
14 
\u000E'\x0e'\x0E
15 
\u000F'\x0f'\x0F
16 
\u0010'\x10'\x10
17 
\u0011'\x11'\x11
18 
\u0012'\x12'\x12
19 
\u0013'\x13'\x13
20 
\u0014'\x14'\x14
21 
\u0015'\x15'\x15
22 
\u0016'\x16'\x16
23 
\u0017'\x17'\x17
24 
\u0018'\x18'\x18
25 
\u0019'\x19'\x19
26 
\u001A'\x1a'\x1A
27 
\u001B'\x1b'\x1B
28 
\u001C'\x1c'\x1C
29 
\u001D'\x1d'\x1D
30 
\u001E'\x1e'\x1E
31 
\u001F'\x1f'\x1F
32




' '
33!!!!!'\x21'\x21
34""""\"'\x22'\"
35#####'\x23'\x23
36$$$$$'\x24'\x24
37%%%%%'\x25'\x25
38&&&&&'\x26'\x26
39''''\''\x27'\'
40((((('\x28'\x28
41)))))'\x29'\x29
42*****'\x2a'\x2A
43+++++'\x2b'\x2B
44,,,,,',',
45-----'\x2d'-
46.....'.'.
47////\/'\x2f'\x2F
4800000'0'0
4911111'1'1
5022222'2'2
5133333'3'3
5244444'4'4
5355555'5'5
5466666'6'6
5577777'7'7
5688888'8'8
5799999'9'9
58:::::'\x3a'\x3A
59;;&#59;&#59;;'\x3b'\x3B
60<&lt;&#60;&lt;<'\x3c'\x3C
61==&#61;&#61;='\x3d'\x3D
62>&gt;&#62;&gt;>'\x3e'\x3E
63??&#63;&#63;?'\x3f'\x3F
64@@&#64;&#64;@'\x40'\x40
65AAAAA'A'A
66BBBBB'B'B
67CCCCC'C'C
68DDDDD'D'D
69EEEEE'E'E
70FFFFF'F'F
71GGGGG'G'G
72HHHHH'H'H
73IIIII'I'I
74JJJJJ'J'J
75KKKKK'K'K
76LLLLL'L'L
77MMMMM'M'M
78NNNNN'N'N
79OOOOO'O'O
80PPPPP'P'P
81QQQQQ'Q'Q
82RRRRR'R'R
83SSSSS'S'S
84TTTTT'T'T
85UUUUU'U'U
86VVVVV'V'V
87WWWWW'W'W
88XXXXX'X'X
89YYYYY'Y'Y
90ZZZZZ'Z'Z
91[[&#91;&#91;['\x5b'\x5B
92\\&#92;&#92;\\'\x5c'\\
93]]&#93;&#93;]'\x5d'\x5D
94^^&#94;&#94;^'\x5e'\x5E
95__&#95;__'\x5f'_
96``&#96;&#96;`'\x60'\x60
97aaaaa'a'a
98bbbbb'b'b
99ccccc'c'c
100ddddd'd'd
101eeeee'e'e
102fffff'f'f
103ggggg'g'g
104hhhhh'h'h
105iiiii'i'i
106jjjjj'j'j
107kkkkk'k'k
108lllll'l'l
109mmmmm'm'm
110nnnnn'n'n
111ooooo'o'o
112ppppp'p'p
113qqqqq'q'q
114rrrrr'r'r
115sssss's's
116ttttt't't
117uuuuu'u'u
118vvvvv'v'v
119wwwww'w'w
120xxxxx'x'x
121yyyyy'y'y
122zzzzz'z'z
123{{&#123;&#123;{'\x7b'\x7B
124||&#124;&#124;|'\x7c'\x7C
125}}&#125;&#125;}'\x7d'\x7D
126~~&#126;&#126;~'\x7e'\x7E
127&#127;
'\x7f'\x7F
128&#128;&#128;
\u0080'\u0080'\x80
129&#129;&#129;
\u0081'\u0081'\x81
130&#130;&#130;
\u0082'\u0082'\x82
131ƒ&#131;&#131;
\u0083'\u0083'\x83
132&#132;&#132;
\u0084'\u0084'\x84
133&#133;&#133;
\u0085'\u0085'\x85
134&#134;&#134;
\u0086'\u0086'\x86
135&#135;&#135;
\u0087'\u0087'\x87
136ˆ&#136;&#136;
\u0088'\u0088'\x88
137&#137;&#137;
\u0089'\u0089'\x89
138Š&#138;&#138;
\u008A'\u008a'\x8A
139&#139;&#139;
\u008B'\u008b'\x8B
140Œ&#140;&#140;
\u008C'\u008c'\x8C
141&#141;&#141;
\u008D'\u008d'\x8D
142Ž&#142;&#142;
\u008E'\u008e'\x8E
143&#143;&#143;
\u008F'\u008f'\x8F
144&#144;&#144;
\u0090'\u0090'\x90
145&#145;&#145;
\u0091'\u0091'\x91
146&#146;&#146;
\u0092'\u0092'\x92
147&#147;&#147;
\u0093'\u0093'\x93
148&#148;&#148;
\u0094'\u0094'\x94
149&#149;&#149;
\u0095'\u0095'\x95
150&#150;&#150;
\u0096'\u0096'\x96
151&#151;&#151;
\u0097'\u0097'\x97
152˜&#152;&#152;
\u0098'\u0098'\x98
153&#153;&#153;
\u0099'\u0099'\x99
154š&#154;&#154;
\u009A'\u009a'\x9A
155&#155;&#155;
\u009B'\u009b'\x9B
156œ&#156;&#156;
\u009C'\u009c'\x9C
157&#157;&#157;
\u009D'\u009d'\x9D
158ž&#158;&#158;
\u009E'\u009e'\x9E
159Ÿ&#159;&#159;
\u009F'\u009f'\x9F
160 
&nbsp;&#160;&nbsp;\u00A0'\u00a0'\xA0
161¡&iexcl;&#161;&iexcl;\u00A1'\u00a1'\xA1
162¢&cent;&#162;&cent;\u00A2'\u00a2'\xA2
163£&pound;&#163;&pound;\u00A3'\u00a3'\xA3
164¤&curren;&#164;&curren;\u00A4'\u00a4'\xA4
165¥&yen;&#165;&yen;\u00A5'\u00a5'\xA5
166¦&brvbar;&#166;&brvbar;\u00A6'\u00a6'\xA6
167§&sect;&#167;&sect;\u00A7'\u00a7'\xA7
168¨&uml;&#168;&uml;\u00A8'\u00a8'\xA8
169©&copy;&#169;&copy;\u00A9'\u00a9'\xA9
170ª&ordf;&#170;&ordf;\u00AA'\u00aa'\xAA
171«&laquo;&#171;&laquo;\u00AB'\u00ab'\xAB
172¬&not;&#172;&not;\u00AC'\u00ac'\xAC
173­&shy;&#173;&shy;\u00AD'\u00ad'\xAD
174®&reg;&#174;&reg;\u00AE'\u00ae'\xAE
175¯&macr;&#175;&macr;\u00AF'\u00af'\xAF
176°&deg;&#176;&deg;\u00B0'\u00b0'\xB0
177±&plusmn;&#177;&plusmn;\u00B1'\u00b1'\xB1
178²&sup2;&#178;&sup2;\u00B2'\u00b2'\xB2
179³&sup3;&#179;&sup3;\u00B3'\u00b3'\xB3
180´&acute;&#180;&acute;\u00B4'\u00b4'\xB4
181µ&micro;&#181;&micro;\u00B5'\u00b5'\xB5
182&para;&#182;&para;\u00B6'\u00b6'\xB6
183·&middot;&#183;&middot;\u00B7'\u00b7'\xB7
184¸&cedil;&#184;&cedil;\u00B8'\u00b8'\xB8
185¹&sup1;&#185;&sup1;\u00B9'\u00b9'\xB9
186º&ordm;&#186;&ordm;\u00BA'\u00ba'\xBA
187»&raquo;&#187;&raquo;\u00BB'\u00bb'\xBB
188¼&frac14;&#188;&frac14;\u00BC'\u00bc'\xBC
189½&frac12;&#189;&frac12;\u00BD'\u00bd'\xBD
190¾&frac34;&#190;&frac34;\u00BE'\u00be'\xBE
191¿&iquest;&#191;&iquest;\u00BF'\u00bf'\xBF
192À&Agrave;&#192;&Agrave;\u00C0'\u00c0'\xC0
193Á&Aacute;&#193;&Aacute;\u00C1'\u00c1'\xC1
194Â&Acirc;&#194;&Acirc;\u00C2'\u00c2'\xC2
195Ã&Atilde;&#195;&Atilde;\u00C3'\u00c3'\xC3
196Ä&Auml;&#196;&Auml;\u00C4'\u00c4'\xC4
197Å&Aring;&#197;&Aring;\u00C5'\u00c5'\xC5
198Æ&AElig;&#198;&AElig;\u00C6'\u00c6'\xC6
199Ç&Ccedil;&#199;&Ccedil;\u00C7'\u00c7'\xC7
200È&Egrave;&#200;&Egrave;\u00C8'\u00c8'\xC8
201É&Eacute;&#201;&Eacute;\u00C9'\u00c9'\xC9
202Ê&Ecirc;&#202;&Ecirc;\u00CA'\u00ca'\xCA
203Ë&Euml;&#203;&Euml;\u00CB'\u00cb'\xCB
204Ì&Igrave;&#204;&Igrave;\u00CC'\u00cc'\xCC
205Í&Iacute;&#205;&Iacute;\u00CD'\u00cd'\xCD
206Î&Icirc;&#206;&Icirc;\u00CE'\u00ce'\xCE
207Ï&Iuml;&#207;&Iuml;\u00CF'\u00cf'\xCF
208Ð&ETH;&#208;&ETH;\u00D0'\u00d0'\xD0
209Ñ&Ntilde;&#209;&Ntilde;\u00D1'\u00d1'\xD1
210Ò&Ograve;&#210;&Ograve;\u00D2'\u00d2'\xD2
211Ó&Oacute;&#211;&Oacute;\u00D3'\u00d3'\xD3
212Ô&Ocirc;&#212;&Ocirc;\u00D4'\u00d4'\xD4
213Õ&Otilde;&#213;&Otilde;\u00D5'\u00d5'\xD5
214Ö&Ouml;&#214;&Ouml;\u00D6'\u00d6'\xD6
215×&times;&#215;&times;\u00D7'\u00d7'\xD7
216Ø&Oslash;&#216;&Oslash;\u00D8'\u00d8'\xD8
217Ù&Ugrave;&#217;&Ugrave;\u00D9'\u00d9'\xD9
218Ú&Uacute;&#218;&Uacute;\u00DA'\u00da'\xDA
219Û&Ucirc;&#219;&Ucirc;\u00DB'\u00db'\xDB
220Ü&Uuml;&#220;&Uuml;\u00DC'\u00dc'\xDC
221Ý&Yacute;&#221;&Yacute;\u00DD'\u00dd'\xDD
222Þ&THORN;&#222;&THORN;\u00DE'\u00de'\xDE
223ß&szlig;&#223;&szlig;\u00DF'\u00df'\xDF
224à&agrave;&#224;&agrave;\u00E0'\u00e0'\xE0
225á&aacute;&#225;&aacute;\u00E1'\u00e1'\xE1
226â&acirc;&#226;&acirc;\u00E2'\u00e2'\xE2
227ã&atilde;&#227;&atilde;\u00E3'\u00e3'\xE3
228ä&auml;&#228;&auml;\u00E4'\u00e4'\xE4
229å&aring;&#229;&aring;\u00E5'\u00e5'\xE5
230æ&aelig;&#230;&aelig;\u00E6'\u00e6'\xE6
231ç&ccedil;&#231;&ccedil;\u00E7'\u00e7'\xE7
232è&egrave;&#232;&egrave;\u00E8'\u00e8'\xE8
233é&eacute;&#233;&eacute;\u00E9'\u00e9'\xE9
234ê&ecirc;&#234;&ecirc;\u00EA'\u00ea'\xEA
235ë&euml;&#235;&euml;\u00EB'\u00eb'\xEB
236ì&igrave;&#236;&igrave;\u00EC'\u00ec'\xEC
237í&iacute;&#237;&iacute;\u00ED'\u00ed'\xED
238î&icirc;&#238;&icirc;\u00EE'\u00ee'\xEE
239ï&iuml;&#239;&iuml;\u00EF'\u00ef'\xEF
240ð&eth;&#240;&eth;\u00F0'\u00f0'\xF0
241ñ&ntilde;&#241;&ntilde;\u00F1'\u00f1'\xF1
242ò&ograve;&#242;&ograve;\u00F2'\u00f2'\xF2
243ó&oacute;&#243;&oacute;\u00F3'\u00f3'\xF3
244ô&ocirc;&#244;&ocirc;\u00F4'\u00f4'\xF4
245õ&otilde;&#245;&otilde;\u00F5'\u00f5'\xF5
246ö&ouml;&#246;&ouml;\u00F6'\u00f6'\xF6
247÷&divide;&#247;&divide;\u00F7'\u00f7'\xF7
248ø&oslash;&#248;&oslash;\u00F8'\u00f8'\xF8
249ù&ugrave;&#249;&ugrave;\u00F9'\u00f9'\xF9
250ú&uacute;&#250;&uacute;\u00FA'\u00fa'\xFA
251û&ucirc;&#251;&ucirc;\u00FB'\u00fb'\xFB
252ü&uuml;&#252;&uuml;\u00FC'\u00fc'\xFC
253ý&yacute;&#253;&yacute;\u00FD'\u00fd'\xFD
254þ&thorn;&#254;&thorn;\u00FE'\u00fe'\xFE
255ÿ&yuml;&#255;&yuml;\u00FF'\u00ff'\xFF

4 comments:

Jeff Williams said...

Hi - this is great work. It looks like there are some serious inconsistencies here in how output encoding is handled. Would you be willing to include OWASP ESAPI in the test? The latest versions has codecs for all of these schemes and more, including CSS, MySQL, Oracle, etc... ESAPI also handles *decoding* (including double-encoding) which is quite complex. Thanks for this work!

Nick Coblentz said...

I am looking into including ESAPI in the tests shortly. I'm not sure how I missed this project. It has a lot of really cool stuff in it beyond output encoding. I may write a post just about ESAPI in general in the near future.

Jeff Williams said...

Thanks Nick! We've studied all the specs to try to get these right in ESAPI (they're linked in the javadocs). It's important to note that some characters are illegal in certain encoding schemes. If anyone notices any issues with the ESAPI encodings, please let us know: http://www.owasp.org/index.php/ESAPI.

jwilliams said...

Hi Nick, we've updated ESAPI to make sure that illegal characters were replaced with the official u+FFFD character. Replacing with whitespace may allow an attacker more freedom.