Ticket #26 (seen Feature Request)

Opened 3 years ago

Last modified 19 months ago

case insensitive regular expressions

Reported by: vern Owned by:
Priority: Normal Milestone:
Component: Bro Version:
Keywords: Cc:

Description

There should be a way of annotating a regular expression (e.g., &case-insensitive) to mean that it should match the input regardless of case.

Change History

comment:1 Changed 3 years ago by robin

  • Status changed from new to seen

comment:2 Changed 3 years ago by bernhard

Actually  http://bro-ids.org/wiki/index.php/Reference_Manual:_Values%2C_Types%2C_and_Constants#Pattern_Constants states that Bro regexp syntax were the same as for flex. Flex allows options in regular expressions, e.g., to do a case-insensitive match for 'pattern' you can write your expression as '(?i:pattern)' in flex, which would somewhat resolve this bug. However doing this within bro results in a "run-time error: error compiling pattern", so probably the documentation should be adopted or the functionality be implemented.

comment:3 Changed 19 months ago by seth

The existing partial implementation of this is available in the topic/seth/case-insensitive-patterns branch in the git repository. Case insensitivity is enabled with an "i" flag at the end of the pattern. For example:

global my_pattern = /abcdef/i;

would match "ABCDEF". The problem with it comes when you do a disjunction between patterns in your Bro script. Pattern disjunctions in Bro are currently done by extracting the text of the pattern and OR-ing them together as text then recreating the DFAs and NFAs but my current implementation of case insensitive patterns does the case insensitivity during construction of the NFAs. Apparently the disjunction can't properly be done with the NFAs alone because of issues with anchored patterns. (something Vern said, but I may have misunderstood it)

comment:4 Changed 19 months ago by seth

More thoughts I forgot to add.

What I would like to see happen is something along these lines...

const p = /abc/i &redef;
redef p += /DEF/;

The result would be that /abc/ would be a case insensitive portion of the pattern "p" and /DEF/ would be case sensitive.

Continuing...

const p = /abc/ &redef &case_insensitive;
redef p += /DEF/;

This could be another use case where /abc/ is case insensitive along with /DEF/. This could be useful for situations where a large number of disjunctions are being done or an original script writer wants all further patterns added to the pattern to be case insensitive without needing the user of the script to define all joined patterns as case insensitive.

comment:5 Changed 19 months ago by gregor

How about using pythons regex syntax for case-insensitivity? (E.g., (?iABC) )

comment:6 Changed 19 months ago by seth

Why that over the trailing "i"? Using that syntax still doesn't solve the problem of how the disjunction is done between patterns.

Note: See TracTickets for help on using tickets.