This section is an overview of the Highlight Definition XML format. Based on a small example it will describe the main components and their meaning and usage. The next section will go into detail with the highlight detection rules.
The formal definition, aka the DTD is stored in the file language.dtd which should be installed on your system in the folder $KDEDIR/share/apps/katepart/syntax. If $KDEDIR is unset look up the folder by using kde-config --prefix.
A highlighting file contains a header that sets the XML version and the doctype:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
The root of the definition file is the element language. Available attributes are:
Required attributes:
Optional attributes:
So the next line may look like this:
<language name="C++" version="1.00" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" />
Next comes the highlighting element, which contains the optional element list and the required elements contexts and itemDatas.
list elements contain a list of keywords. In this case the keywords are class and const. You can add as many lists as you need. The contexts element contains all contexts. The first context is by default the start of the highlighting. There are two rules in the context Normal Text, which match the list of keywords with the name somename and a rule that detects a quote and switches the context to string. To learn more about rules read the next chapter. The third part is the itemDatas element. It contains all color and font styles needed by the contexts and rules. In this example, the itemData Normal Text, String and Keyword are used.
<highlighting>
<list name="somename">
<item> class </item>
<item> const </item>
</list>
<contexts>
<context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" >
<keyword attribute="Keyword" context="#stay" String="somename" />
<DetectChar attribute="String" context="string" char=""" />
</context>
<context attribute="String" lineEndContext="#stay" name="string" >
<DetectChar attribute="String" context="#pop" char=""" />
</context>
</contexts>
<itemDatas>
<itemData name="Normal Text" defStyleNum="dsNormal" />
<itemData name="Keyword" defStyleNum="dsKeyword" />
<itemData name="String" defStyleNum="dsString" />
</itemDatas>
</highlighting>
The last part of a highlight definition is the optional general section. It may contain information about keywords, code folding, comments and indentation.
The comment section defines with what string a single line comment is introduced. You also can define a multiline comments using multiLine with the additional attribute end. This is used if the user presses the corresponding shortcut for comment/uncomment. The keywords section defines whether keyword lists are case sensitive or not. Other attributes will be explained later.
<general>
<comments>
<comment name="singleLine" start="#"/>
</comments>
<keywords casesensitive="1"/>
</general>
</language>
This part will describe all available attributes for contexts, itemDatas, keywords, comments, code folding and indentation.
The element context belongs into the group contexts. A context itself defines context specific rules like what should happen if the highlight system reaches the end of a line. Available attributes are:
The element itemData is in the group itemDatas. It defines the font style and colors. So it is possible to define your own styles and colors, however we recommend to stick to the default styles if possible so that the user will always see the same colors used in different languages. Though, sometimes there is no other way and it is necessary to change color and font attributes. The attributes name and defStyleNum are required, the other optional. Available attributes are:
The element keywords in the group general defines keyword properties. Available attributes are:
The element comment in the group comments defines comment properties which are used
for Tools > Comment and Tools > Uncomment.
Available attributes are:
position="afterwhitespace"./*.*/.The element folding in the group general defines code folding properties. Available attributes are:
The element indentation in the group general defines which indenter will be used, however we strongly recommend to omit this element, as the indenter usually will be set by either defining a File Type or by adding a mode line to the text file. If you specify an indenter though, you will force a specific indentation on the user, which he might not like at all. Available attributes are:
Default styles are predefined font and color styles. For convenience Kate provides several default styles, in detail:
This section describes the syntax detection rules.
Each rule can match zero or more characters at the beginning of the string they are tested against. If the rule matches, the matching characters are assigned the style or attribute defined by the rule, and a rule may ask that the current context is switched.
A rule looks like this:
The attribute identifies the style to use for matched characters by name, and the context identifies the context to use from here.
The context can be identified by:
Some rules can have child rules which are then evaluated only if the parent rule matched. The entire matched string will be given the attribute defined by the parent rule. A rule with child rules looks like this:
<RuleName (attributes)>
<ChildRuleName (attributes) />
...
</RuleName>
Rule specific attributes vary and are described in the following sections.
All rules have the following attributes in common and are available whenever a (common attributes) appears. All following attributes are optional.
Some rules allow the optional attribute dynamic of type boolean that defaults to false. If dynamic is true, a rule can use placeholders representing the text matched by a regular expression rule that switched to the current context in its string or char attributes. In a string, the placeholder %N (where N is a number) will be replaced with the corresponding capture N from the calling regular expression. In a char the placeholer must be a number N and it will be replaced with the first character of the corresponding capture N from the calling regular expression. Whenever a rule allows this attribute it will contain a (dynamic).
Detect a single specific character. Commonly used for example to find the ends of quoted strings.
<DetectChar char="(character)" (common attributes) (dynamic) />
The char attribute defines the character to match.
Detect two specific characters in a defined order.
<Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) />
The char attribute defines the first character to match, char1 the second.
Detect one character of a set of specified characters.
<AnyChar String="(string)" (common attributes) />
The String attribute defines the set of characters.
Detect an exact string.
<StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) />
The String attribute defines the string to match. The insensitive attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used.
Matches against a regular expression.
<RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) />
Because the rules are always matched against the beginning of the current string, a regular expression starting with a caret (^) indicates that the rule should only be matched against the start of a line.
Detect a keyword from a specified list.
<keyword String="(list name)" (common attributes) />
The String attribute identifies the keyword list by name. A list with that name must exist.
Detect an integer number.
<Int (common attributes) (dynamic) />
This rule has no specific attributes. Child rules are typically used to detect combinations of L and U after the number, indicating the integer type in program code. Actually all rules are allowed as child rules, though, the DTD only allowes the child rule StringDetect. The following example matches integer numbers follows by the character ‘L’.
<Int attribute="Decimal" context="#stay" >
<StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/>
</Int>
Detect a floating point number.
<Float (common attributes) />
This rule has no specific attributes. AnyChar is allowed as a child rules and typically used to detect combinations, see rule Int for reference.
Detect an octal point number representation.
<HlCOct (common attributes) />
This rule has no specific attributes.
Detect a hexadecimal number representation.
<HlCHex (common attributes) />
This rule has no specific attributes.
Detect an escaped character.
<HlCStringChar (common attributes) />
This rule has no specific attributes.
It matches literal representations of characters commonly used in program code, for example \n (newline) or \t (tabulator).
The following characters will match if they follow a backslash (\): abefnrtv"’?\. Additionally, escaped hexadecimal numbers like for example \xff and escaped octal numbers, for example \033 will match.
Detect an C character.
<HlCChar (common attributes) />
This rule has no specific attributes.
It matches C characters enclosed in a tick (Example: ‘c’). So in the ticks may be a simple character or an escaped character. See HlCStringChar for matched escaped character sequences.
Detect a string with defined start and end characters.
<RangeDetect char="(character)" char1="(character)" (common attributes) />
char defines the character starting the range, char1 the character ending the range. Usefull to detect for example small quoted strings and the like, but note that since the highlighting engine works on one line at a time, this will not find strings spanning over a line break.
Matches at end of line.
<LineContinue (common attributes) />
This rule has no specific attributes.
This rule is useful for switching context at end of line, if the last character is a backslash (‘\’). This is needed for example in C/C++ to continue macros or strings.
Include rules from another context or language/file.
<IncludeRules context="contextlink" [includeAttrib="true|false"] />
The context attribute defines which context to include. If it a simple string it includes all defined rules into the current context, example:
<IncludeRules context="anotherContext" />
If the string begins with ## the highlight system will look for another language definition with the given name, example:
<IncludeRules context="##C++" />
If includeAttrib attribute is true, change the destination attribute to the one of the source. This is required to make for example commenting work, if text matched by the included context is a different highlight than the host context.
Detect whitespaces.
<DetectSpaces (common attributes) />
This rule has no specific attributes. Use this rule if you know that there can several whitespaces ahead, for example in the beginning of indented lines. This rule will skip all whitespace at once, instead of testing multiple rules and skipping one at the time due to no match.
Detect identifier strings (as a regular expression: [a-zA-Z][a-zA-Z0-9]*).
<DetectIdentifier (common attributes) />
This rule has no specific attributes. Use this rule to skip a string of word characters at once, rather than testing with multiple rules and skipping one at the time due to no match.
Once you have understood how the context switching works it will be easy to write highlight definitions. Though you should carefully check what rule you choose in what situation. Regular expressions are very mighty, but they are slow compared to the other rules. So you may consider the following tips.
If you only match two characters use Detect2Chars instead of StringDetect. The same applies to DetectChar.
Regular expressions are easy to use but often there is another much faster way to achieve the same result. Consider you only want to match the character # if it is the first character in the line. A regular expression based solution would look like this:
<RegExpr attribute="Macro" context="macro" String="^\s*#" />
You can achieve the same much faster in using:
<DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" />
If you want to match the regular expression ‘^#’ you can still use DetectChar with the attribute column="0". The attribute column counts character based, so a tabulator still is only one character.
You can switch contexts without processing characters. Assume that you want to switch context when you meet the string /*, but need to process that string in the next context. The below rule will match, and the **lookAhead attribute will cause the highlighter to keep the matched string for the next context.
<Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" />
Use DetectSpaces if you know that many whitespaces occur.
xmllint --dtdvalid language.dtd mySyntax.xml.If you repeat complex regular expression very often you can use ENTITIES. Example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd"
[
<!ENTITY myref "[A-Za-z_:][\w.:_-]*">
]>
Now you can use &myref; instead of the regular expression.
Comments
great
This is really helpful, and I’ve just been getting into Highlight Definition files and XML, so any resources are good.
http://www.customercreditcard.com/capital-one-cards.php
strikeOut
Above, for ItemData it says: strikeout if true, the text will be stroked out.strikeout if true, the text will be stroked out.
For the version of Kate shipped with Kubuntu Hardy anyway, it should be strikeOut (capitalized O).
set background color
i know its possible to set the background in the configuration files and in the options GUI, but what’s the attribute in the itemdata?
the alert uses a custom background color for a context and i’m trying to do the same for another context.
My goal was to set a different background to a certain custom mimetype, but failing to find a way, using a background color for every itemdata also serves my purpose.
Is there a way?
Thanks!
Pedro de Carvalho.
Detect first line of Context
Hello is it possible to detect the first line of a context.
e.g.
mark when
-1 !! Mark 343 42343 -1
not mark when
-1 2334 !! not mark 32434 -1
Thanks Flo
Post new comment