Ticket #26 (seen Feature Request)
case insensitive regular expressions
| Reported by: | vern | Owned by: | |
|---|---|---|---|
| Priority: | Normal | Milestone: | |
| Component: | Bro | Version: | |
| Keywords: | Cc: |
Description
There should be a way of annotating a regular expression (e.g., &case-insensitive) to mean that it should match the input regardless of case.
Change History
comment:2 Changed 3 years ago by bernhard
Actually http://bro-ids.org/wiki/index.php/Reference_Manual:_Values%2C_Types%2C_and_Constants#Pattern_Constants states that Bro regexp syntax were the same as for flex. Flex allows options in regular expressions, e.g., to do a case-insensitive match for 'pattern' you can write your expression as '(?i:pattern)' in flex, which would somewhat resolve this bug. However doing this within bro results in a "run-time error: error compiling pattern", so probably the documentation should be adopted or the functionality be implemented.
comment:3 Changed 19 months ago by seth
The existing partial implementation of this is available in the topic/seth/case-insensitive-patterns branch in the git repository. Case insensitivity is enabled with an "i" flag at the end of the pattern. For example:
global my_pattern = /abcdef/i;
would match "ABCDEF". The problem with it comes when you do a disjunction between patterns in your Bro script. Pattern disjunctions in Bro are currently done by extracting the text of the pattern and OR-ing them together as text then recreating the DFAs and NFAs but my current implementation of case insensitive patterns does the case insensitivity during construction of the NFAs. Apparently the disjunction can't properly be done with the NFAs alone because of issues with anchored patterns. (something Vern said, but I may have misunderstood it)
comment:4 Changed 19 months ago by seth
More thoughts I forgot to add.
What I would like to see happen is something along these lines...
const p = /abc/i &redef; redef p += /DEF/;
The result would be that /abc/ would be a case insensitive portion of the pattern "p" and /DEF/ would be case sensitive.
Continuing...
const p = /abc/ &redef &case_insensitive; redef p += /DEF/;
This could be another use case where /abc/ is case insensitive along with /DEF/. This could be useful for situations where a large number of disjunctions are being done or an original script writer wants all further patterns added to the pattern to be case insensitive without needing the user of the script to define all joined patterns as case insensitive.