1. Copyright Notice

2. In a Nutshell

3. Functionalities

3.1. Origin BibTeX Bases

3.2. Regexes

3.3. New BibTeX base

3.4. Configuration Nenu

3.5. Tips and Tricks

© Copyright 2003 Yann-Gaël Guéhéneuc. Use and copying of this software and preparation of derivative works based upon this software are permitted. Any copy of this software or of any derivative work must include the above copyright notice of the author, this paragraph and the one after it. This software is made available AS IS, and THE AUTHOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, AND NOT WITHSTANDING ANY OTHER PROVISION CONTAINED HEREIN, ANY LIABILITY FOR DAMAGES RESULTING FROM THE SOFTWARE OR ITS USE IS EXPRESSLY DISCLAIMED, WHETHER ARISING IN CONTRACT, TORT (INCLUDING NEGLIGENCE) OR STRICT LIABILITY, EVEN IF THE AUTHOR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. All Rights Reserved.

BibTeX Extractor is a tool to extract BibTeX entries from a set of BibTeX bases using regular expressions.

BibTeX Extractor automates the task of extracting subset of BibTeX bases to be
used with BibTeX styles, such as `Multilingual.sty`

and
`Bibtopic.sty`

.

Should you have any question, comment, bug, or feature request, please send an e-mail to Yann-Gaël Guéhéneuc:

`yann <dash> gael <at> gueheneuc <dot> net`

With the *Add*/*Remove* buttons, BibTeX bases can be added to the list, from which
entries can be extracted.

The regular expressions used to extract entries from the BibTeX
bases. Regular expressions can be added using the *add*/*remove* buttons only
when, at least, one BibTeX base is added. They can be of three types:

- <field> = <value>, e.g. "kind = RR", "author = Christopher Alexander".
- KEY = <value>, e.g. "KEY = Alexander77-PatternLanguage", "KEY = GoF".
- TYPE = <value>, e.g. "TYPE = misc", "TYPE = phdthesis".

Regular expressions of type <field> must all be satisfied for an entry to match. At least one regular expression of type KEY must match. At least one regular expression of type TYPE must match:

An entry matches if and only if each regular expression <field> = <value> matches AND at least one regular expression KEY matches AND at least one regular expression TYPE matches

Regular expressions of the three different types need not to be present, then missing types are not taken into account in the equation.

<value> must satisfy Java regular expressions, as defined in the java.util.regex.Pattern class:

Characters x The character x \\ The backslash character \0n The character with octal value 0n (0 <= n <= 7) \0nn The character with octal value 0nn (0 <= n <= 7) \0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) \xhh The character with hexadecimal value 0xhh \uhhhh The character with hexadecimal value 0xhhhh \t The tab character ('\u0009') \n The newline (line feed) character ('\u000A') \r The carriage-return character ('\u000D') \f The form-feed character ('\u000C') \a The alert (bell) character ('\u0007') \e The escape character ('\u001B') \cx The control character corresponding to x Character classes [abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction) Predefined character classes . Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w] POSIX character classes (US-ASCII only) \p{Lower} A lower-case alphabetic character: [a-z] \p{Upper} An upper-case alphabetic character:[A-Z] \p{ASCII} All ASCII:[\x00-\x7F] \p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}] \p{Digit} A decimal digit: [0-9] \p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}] \p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ \p{Graph} A visible character: [\p{Alnum}\p{Punct}] \p{Print} A printable character: [\p{Graph}] \p{Blank} A space or a tab: [ \t] \p{Cntrl} A control character: [\x00-\x1F\x7F] \p{XDigit} A hexadecimal digit: [0-9a-fA-F] \p{Space} A whitespace character: [ \t\n\x0B\f\r] Classes for Unicode blocks and categories \p{InGreek} A character in the Greek block (simple block) \p{Lu} An uppercase letter (simple category) \p{Sc} A currency symbol \P{InGreek} Any character except one in the Greek block (negation) [\p{L}-[\p{Lu}]] Any letter except an uppercase letter (subtraction) Boundary matchers ^ The beginning of a line $ The end of a line \b A word boundary \B A non-word boundary \A The beginning of the input \G The end of the previous match \Z The end of the input but for the final terminator, if any \z The end of the input Greedy quantifiers X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X(n,} X, at least n times X{n,m} X, at least n but not more than m times Reluctant quantifiers X?? X, once or not at all X*? X, zero or more times X+? X, one or more times X{n}? X, exactly n times X(n,}? X, at least n times X{n,m}? X, at least n but not more than m times Possessive quantifiers X?+ X, once or not at all X*+ X, zero or more times X++ X, one or more times X{n}+ X, exactly n times X(n,}+ X, at least n times X{n,m}+ X, at least n but not more than m times Logical operators XY X followed by Y X|Y Either X or Y (X) X, as a capturing group Back references \n Whatever the nth capturing group matched Quotation \ Nothing, but quotes the following character \Q Nothing, but quotes all characters until \E \E Nothing, but ends quoting started by \Q Special constructs (non-capturing) (?:X) X, as a non-capturing group (?idmsux-idmsux) Nothing, but turns match flags on - off (?idmsux-idmsux:X) X, as a capturing group with the given flags on - off (?=X) X, via zero-width positive lookahead (?!X) X, via zero-width negative lookahead (?<=X) X, via zero-width positive lookbehind (?<!X) X, via zero-width negative lookbehind (?>X) X, as an independent, non-capturing group

The text field and its associated *Browse* button allow to give the full path of
the new BibTeX file into which entries can be copied.

The *Preview* button brings to front a window which contains the result of the
BibTeX entries extraction, without modifying the chosen new BibTeX file. The
*Extract it!* button extracts the BibTeX entries and write them down into
the chosen new BibTeX file.

The *Configuration* menu simplifies BibTeX Extractor usage. The *Load*
/*Save* items loads from and saves to a file the current configuration:

- The selected BibTeX bases.
- The regular expressions.
- The chosen new BibTeX file.

Thus, it is possible to save and reproduce in the future BibTeX entries extraction on several BibTeX bases, with sophisticated regular expressions.

The *Reset* item resets the current configuration. (Only the configuration
name and file are rested.)

Thank you to the contributors of this section!

- Miguel Ruiz: If you want to find a word in a multiline field, such as <abstract>, your regex should read:
`abstract = [\s\S]*word[\s\S]*`