BibTeX Extractor

1. Copyright Notice
2. In a Nutshell
3. Functionalities
      3.1. Origin BibTeX Bases
      3.2. Regexes
      3.3. New BibTeX base
      3.4. Configuration Nenu
      3.5. Tips and Tricks

1. Copyright Notice Top

© Copyright 2003 Yann-Gaël Guéhéneuc.

Use and copying of this software and preparation of derivative works
based upon this software are permitted. Any copy of this software or
of any derivative work must include the above copyright notice of
the author, this paragraph and the one after it.

This software is made available AS IS, and THE AUTHOR DISCLAIMS
ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE, AND NOT WITHSTANDING ANY OTHER PROVISION CONTAINED HEREIN,
ANY LIABILITY FOR DAMAGES RESULTING FROM THE SOFTWARE OR ITS USE IS
EXPRESSLY DISCLAIMED, WHETHER ARISING IN CONTRACT, TORT (INCLUDING
NEGLIGENCE) OR STRICT LIABILITY, EVEN IF THE AUTHOR IS ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

All Rights Reserved.

2. In a Nutshell Top

BibTeX Extractor is a tool to extract BibTeX entries from a set of BibTeX bases using regular expressions.

BibTeX Extractor automates the task of extracting subset of BibTeX bases to be used with BibTeX styles, such as Multilingual.sty and Bibtopic.sty.

Should you have any question, comment, bug, or feature request, please send an e-mail to Yann-Gaël Guéhéneuc:

yann <dash> gael <at> gueheneuc <dot> net

3. Functionalities Top

3.1. Origin BibTeX Bases Top

With the Add/Remove buttons, BibTeX bases can be added to the list, from which entries can be extracted.

The regular expressions used to extract entries from the BibTeX bases. Regular expressions can be added using the add/remove buttons only when, at least, one BibTeX base is added. They can be of three types:

<field> = <value>, e.g. "kind = RR", "author = Christopher Alexander".
KEY = <value>, e.g. "KEY = Alexander77-PatternLanguage", "KEY = GoF".
TYPE = <value>, e.g. "TYPE = misc", "TYPE = phdthesis".

Regular expressions of type <field> must all be satisfied for an entry to match. At least one regular expression of type KEY must match. At least one regular expression of type TYPE must match:

	An entry matches if and only if
		each regular expression <field> = <value> matches
		AND
		at least one regular expression KEY matches
		AND
		at least one regular expression TYPE matches

Regular expressions of the three different types need not to be present, then missing types are not taken into account in the equation.

<value> must satisfy Java regular expressions, as defined in the java.util.regex.Pattern class:

Characters
x 		The character x
\\ 		The backslash character
\0n 		The character with octal value 0n (0 <= n <= 7)
\0nn 		The character with octal value 0nn (0 <= n <= 7)
\0mnn 		The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh 		The character with hexadecimal value 0xhh
\uhhhh		The character with hexadecimal value 0xhhhh
\t 		The tab character ('\u0009')
\n 		The newline (line feed) character ('\u000A')
\r 		The carriage-return character ('\u000D')
\f 		The form-feed character ('\u000C')
\a 		The alert (bell) character ('\u0007')
\e 		The escape character ('\u001B')
\cx 		The control character corresponding to x
 
Character classes
[abc] 		a, b, or c (simple class)
[^abc] 		Any character except a, b, or c (negation)
[a-zA-Z] 	a through z or A through Z, inclusive (range)
[a-d[m-p]] 	a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] 	d, e, or f (intersection)
[a-z&&[^bc]] 	a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] 	a through z, and not m through p: [a-lq-z](subtraction)
 
Predefined character classes
. 		Any character (may or may not match line terminators)
\d 		A digit: [0-9]
\D 		A non-digit: [^0-9]
\s 		A whitespace character: [ \t\n\x0B\f\r]
\S 		A non-whitespace character: [^\s]
\w 		A word character: [a-zA-Z_0-9]
\W 		A non-word character: [^\w]
 
POSIX character classes (US-ASCII only)
\p{Lower} 	A lower-case alphabetic character: [a-z]
\p{Upper} 	An upper-case alphabetic character:[A-Z]
\p{ASCII} 	All ASCII:[\x00-\x7F]
\p{Alpha} 	An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} 	A decimal digit: [0-9]
\p{Alnum} 	An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct} 	Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph} 	A visible character: [\p{Alnum}\p{Punct}]
\p{Print} 	A printable character: [\p{Graph}]
\p{Blank} 	A space or a tab: [ \t]
\p{Cntrl} 	A control character: [\x00-\x1F\x7F]
\p{XDigit} 	A hexadecimal digit: [0-9a-fA-F]
\p{Space} 	A whitespace character: [ \t\n\x0B\f\r]
 
Classes for Unicode blocks and categories
\p{InGreek} 		A character in the Greek block (simple block)
\p{Lu} 			An uppercase letter (simple category)
\p{Sc} 			A currency symbol
\P{InGreek} 		Any character except one in the Greek block (negation)
[\p{L}-[\p{Lu}]]  	Any letter except an uppercase letter (subtraction)
 
Boundary matchers
^ 		The beginning of a line
$ 		The end of a line
\b 		A word boundary
\B 		A non-word boundary
\A 		The beginning of the input
\G 		The end of the previous match
\Z 		The end of the input but for the final terminator, if any
\z		The end of the input
 
Greedy quantifiers
X? 		X, once or not at all
X* 		X, zero or more times
X+ 		X, one or more times
X{n} 		X, exactly n times
X(n,} 		X, at least n times
X{n,m} 		X, at least n but not more than m times
 
Reluctant quantifiers
X?? 		X, once or not at all
X*? 		X, zero or more times
X+? 		X, one or more times
X{n}? 		X, exactly n times
X(n,}? 		X, at least n times
X{n,m}? 	X, at least n but not more than m times
 
Possessive quantifiers
X?+ 		X, once or not at all
X*+ 		X, zero or more times
X++ 		X, one or more times
X{n}+ 		X, exactly n times
X(n,}+ 		X, at least n times
X{n,m}+ 	X, at least n but not more than m times
 
Logical operators
XY 		X followed by Y
X|Y 		Either X or Y
(X) 		X, as a capturing group
 
Back references
\n 		Whatever the nth capturing group matched
 
Quotation
\ 		Nothing, but quotes the following character
\Q 		Nothing, but quotes all characters until \E
\E 		Nothing, but ends quoting started by \Q
 
Special constructs (non-capturing)
(?:X) 			X, as a non-capturing group
(?idmsux-idmsux)  	Nothing, but turns match flags on - off
(?idmsux-idmsux:X)	X, as a capturing group with the given flags on - off
(?=X) 			X, via zero-width positive lookahead
(?!X) 			X, via zero-width negative lookahead
(?<=X) 			X, via zero-width positive lookbehind
(?<!X) 			X, via zero-width negative lookbehind
(?>X) 			X, as an independent, non-capturing group

3.3. New BibTeX Base Top

The text field and its associated Browse button allow to give the full path of the new BibTeX file into which entries can be copied.

The Preview button brings to front a window which contains the result of the BibTeX entries extraction, without modifying the chosen new BibTeX file. The Extract it! button extracts the BibTeX entries and write them down into the chosen new BibTeX file.

3.4. Configuration Menu Top

The Configuration menu simplifies BibTeX Extractor usage. The Load /Save items loads from and saves to a file the current configuration:

The selected BibTeX bases.
The regular expressions.
The chosen new BibTeX file.

Thus, it is possible to save and reproduce in the future BibTeX entries extraction on several BibTeX bases, with sophisticated regular expressions.

The Reset item resets the current configuration. (Only the configuration name and file are rested.)

3.5. Tips and Tricks Top

Thank you to the contributors of this section!

Miguel Ruiz: If you want to find a word in a multiline field, such as <abstract>, your regex should read: abstract = [\s\S]*word[\s\S]*