[Maildev] Fwd: [Bug 1423840] Rewrite the prefs parser

Wayne Mery vseerror at lehigh.edu
Thu Dec 7 15:02:04 EST 2017


FYI (not seen this yet on m.d. platform)


-------- Forwarded Message --------
Subject: 	[Bug 1423840] Rewrite the prefs parser
Date: 	Thu, 07 Dec 2017 10:35:58 +0000


*Comment # 1 <https://bugzilla.mozilla.org/show_bug.cgi?id=1423840#c1> 
on Bug 1423840 <https://bugzilla.mozilla.org/show_bug.cgi?id=1423840> at 
2017-12-07 02:35:58 PST *

Createdattachment 8935310 <attachment.cgi?id=8935310> [details] 
<attachment.cgi?id=8935310&action=edit>
Bug 1423840  <show_bug.cgi?id=1423840>  - Rewrite the prefs parser.

The prefs parser has two significant problems.

- It doesn't separate tokenizing from parsing.

- It is implemented as a loop around a big switch on a "current state"
   variable.

As a result, it is hard to understand and modify, slower than it could be, and
in obscure cases it fails to parse some valid input.

This patch replaces it with a recursive descent parser (albeit one without any
recursion!) that has separate tokenization. The new parser is easier to
understand and modify, more correct, and has better error messages (including
filenames). It also runs about 1.9x faster than the existing parser. (As
measured by parsing greprefs.js's contents from memory 1000 times in
succession, omitting the prefs hash table construction. If the table
construction is included, it's about 1.4x faster.)

The new parser is slightly stricter than the old parser in a few ways.

- Disconcertingly, the old parser allowed arbitrary junk between prefs
   (including at the start and end of the prefs file) so long as that junk
   didn't include any of the following chars: '/', '#', 'u', 's', 'p'. I.e. a
   line like this:

     !foo at bar&pref("prefname", true);

   would be treated the same as this:

     pref("prefname", true);

   The new parser disallows such junk because it isn't necessary and seems like
   an unintentional botch by the old parser.

- The old parser allowed character 0x1a (SUB) between tokens and treated it
   like '\n'.

   The new parser does not allow this character. SUB was used to indicate
   end-of-file (*not* end-of-line) in some old operating systems such as MS-DOS,
   but this doesn't seem necessary today.

- The old parser tolerated (with a warning) invalid escape sequences within
   string literals -- such as "\q" (not a valid escape) and "\x1" and "\u12"
   (both of which have insufficient hex digits) -- accepting them literally.

   The new parser does not tolerates invalid escape sequences because it doesn't
   seem necessary and would complicate things.

- The old parser tolerated character 0x00 (NUL) within string literals; this is
   dangerous because C++ code that manipulates string values with embedded NULs
   will almost certainly consider those chars as end-of-string markers.

   The new parser treats NUL chars as end-of-file, to avoid this danger and
   because it facilitates a significant optimization (described within the
   code).

- The old parser allowed integer literals to overflow, silently wrapping them.

   The new parser treats integer overflow as a parse error. This seems better,
   and it caught an existing overflow in testing/profiles/prefs_general.js, for
   places.database.lastMaintenance.

The first of these changes meant that a couple of existing prefs with ";;" at
the end had to be changed.

The minor increase in strictness shouldn't be a problem for default pref files
such as greprefs.js within the application (which we can modify), nor for
app-written prefs files such as prefs.js. It could affect user-written prefs
files such as user.js; the experience above suggests that ";;" is the most
likely problem in practice. In my opinion, the risk here is acceptable.

The new parser should also do a better job of tracking line numbers because it
(a) treats "\r\n" sequences as a single end-of-line marker, and (a) pays
attention to end-of-line sequences within string literals.

The patch also adds a thorough test of valid syntax. (Note that the old parser
fails to correctly parse this test in a few places because it misimplemented
the grammar.) I'd like to also add tests for invalid syntax but the current
design makes this challenging, because parse errors trigger an error message on
the web console and a NS_ERROR_FILE_CORRUPTED return value from Parse().

Review commit:https://reviewboard.mozilla.org/r/206210/diff/#index_header
See other reviews:https://reviewboard.mozilla.org/r/206210/


------------------------------------------------------------------------
Product/Component: Core :: Preferences: Backend
------------------------------------------------------------------------
*Tracking Flags:*

  * status-firefox59:affected

------------------------------------------------------------------------
*You are receiving this mail because:*

  * You are watching the component for the bug.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thunderbird.net/pipermail/maildev_lists.thunderbird.net/attachments/20171207/4f2a9bc3/attachment-0001.html>


More information about the Maildev mailing list