This contains technical information about SharpEye. It is aimed at authors of music software who may wish to integrate their products with SharpEye. The main information here is the format of the output files that SharpEye generates. This documentation is preliminary. I cannot guarantee the correctness or completeness of the information. I cannot guarantee that the format will not change in the future, though I have made a serious attempt to make it future-proof. OMR engine input file format ============================ Windows ======= For the Windows version, this is a standard BMP file. It must be one bit per pixel and not compressed. As far as I can see there is no such thing as a compressed 1bpp BMP format - anyway, the biBitCount field must be 1, and biCompression field must be BI_RGB = 0. The music should be black on a white background. The palette can be 0 = white, 1 = black, or vice versa. SharpEye writes a white-is-zero BMP file for the engine, but either should work. RISC OS ======= For the RISC OS version, this is a sprite file, with certain restrictions: Only one image in the file. Mode 18, no mask, a palette with 0=white, 1=black. No LH waste. OMR engine output file format ============================= Overview ======== The format is text-based and human readable (with difficulty). It is not intended to be edited by hand. It is extendible, so that programs that read the format should be able to read newer versions and skip the parts they don't understand. I doubt that I have acheived this aim entirely, but I hope the changes needed to keep reading programs up to date will be minimised. It is only intended that information will stored in this format as a transition into another notation format. It is too bulky and limited as a general purpose format. SharpEye is written in C and the format reflects that. It is intended to be read using fscanf() and its structure closely reflects the C structs and arrays I use. The output from the OMR engine (Liszt) is less interpreted than the output from SharpEye. The same format is used for both kinds of file. The current interpretations done by SharpEye: SharpEye unifies time signatures (makes all time signatures occuring at the same time the same) and makes the score 'rectangular', ie the same number of staves per system. SharpEye attaches slurs/ties to notes where it can, and decides if they are ties or not. SharpEye attaches lyric syllables to notes where it can. SharpEye does rhythm analysis when it loads, and this includes guessing which notes belong to a triplet. I use the file extension .mro on Windows, a file of type 'SharpEye' or 0x183 on RISC OS for these files. General Preliminaries ===================== Syntactical structure ===================== The first thing in the file is an identifier. The rest consists of pairs. Each is a text string made of printable non-whitespace ASCII characters. It is a 'slot name' or 'field name'. The can be of two main types: simple or complex. Simple values are of two types. (1) They can be strings containing printable non-whitespace ASCII characters, usually representing numeric, boolean, or 'type' information. (2) They can be strings enclosed in double quotes (as in a CSV file) representing textual information. Non-ASCII characters may appear between the double quotes but nowhere else in the file. A complex value is a list of pairs enclosed in curly brackets. Eg: { } It is important to note that names, values, '{', and '}' are always separated from one another by whitespace. The name preceeding a textual value always ends with a '$'. Other names never do. Many items in the file are in arrays or lists. In order to conform with the "name-value" structure, arrays look like: arrayname { nof 2 elementname {...} elementname {...} } (The 'nof' is literal, arrayname and elementname will vary.) When reading this format, you should not assume that parts of any structure occur in any particular order. You should assume that you will find tokens you don't understand. Within reason, you should not count on finding things within a structure. For example, you won't find a list of clefs in a bar with no clefs in it, or a list of lyric lines for a stave if there are no lyrics. In most cases, the bottom level fields will all be present. It would be absurd to have a note with no pitch, for example. Mostly, information will be explicitly present even when there are obvious defaults, eg the note head structure will say "accid None", but is safest for future compatibility to construct defaults, then overwrite them with what you find in the file (if you find anything). Low level interpretations ========================= The value 'True' means true or present, 'False' for false or absent. Eg: staccato True - this note has a staccato dot. Integer values are represented in decimal form with an optional minus sign. Eg: nofpages 3 - there are 3 pages in this score. A pair of integer values is represented by two decimal numbers with a comma between, often to represent a position. Eg: flagposn 66,86 - the flag position of this chord is 66 units from the top and 86 units from the left of the stave's top-left. A rational number is represented by two decimal numbers with a forward slash '/' between, often to represent a time. Eg: duration 1/4 - this chord lasts for a quarter-note, ie a crotchet. There are no floating point numbers in the current version. Text strings in double quotes, using "" within the string if a double quote is needed. The text may be encoded in ASCII, or ISO8859-1, possibly UTF8 or others in the future. Note that ISO8859-1 and UTF8 are both extensions of ASCII, so a string in ASCII is the same in all three encodings. Some general conventions ======================== These are not strictly adhered to. They are to aid readability. Use all lower case for slots (field names). Use upper case initials for types/shapes. For integer values which are normally nonnegative, use -1 for impossible/nonexistent/unknown. Use 0/0 for a similar purpose for rationals. Comments ======== The names 'comment' and 'comment$' are reserved. They will never be used to represent any musical element. Therefore, as long as the values following them obey the syntax, they will be skipped by a reading program. The most generally useful form is: comment$ "This is a comment" Units ===== All graphical coordinates increase to right and down. They written as row,column pairs, ie y,x. There are 16 units between stave lines in the output, at least for the current version. Nearly all coordinates are in these units. Exceptions wll be pointed out. Some values are in 'pitch units' where the midline of the stave is zero, with values going up towards the bottom. So the note B on a the midline of a treble stave is 0, C is -1, D is -2, etc. Input units are in image pixels. Main structure ============== The top level structure is fileheader {...} score {...} Since the format is extendible, there could be other structure in later versions. Things will be added within the score structure, so what follows is a minimum you can expect to find. A score has some information to itself, plus a list of pages. A page has some information to itself, plus a list of systems. A system has some information to itself, plus a list of staves, plus a list of slurs/ties. A stave has some information to itself, plus a list of bars, plus a list of lyric lines, plus a list of dynamics (ppp...fff and hairpins). A bar has has some information to itself, plus a list of clefs, a list of keysigs, a list of chords, a bar line, and possibly a timesig. A chord is fairly complex, and is used to represent single notes, rests as well as proper chords. A slur represents a slur or tie or phrase mark. A lyric line has some information to itself, plus a list of elements (syllables). ------------------------------------------------------------------------ fileheader {...} is version characterencoding is the version number of the file as an integer. It is 1000-1999 for version 1 of SharpEye (currently only 1000 used). It is 2000-2999 for version 2 of SharpEye. (currently 2000 or 2011 possible.) is "ASCII" or "ISO88591" or "UTF8". It is ASCII in v1000, ISO88591 in v2000,v2011. ------------------------------------------------------------------------ score {...} is score { title$ unitsperstavespacing preedit

pages { nof page {...} ... page {...} } } is the title of the piece of music. It may be the empty string, ie "". is the number of units per stave spacing. In the current version (1 and 2) this is 16. Thus a normal 5-line staff is 64 units high. The positions of objects are stored in these units. Note that the file contains some positions relating to the input image. These are in pixels. When generating another format you would skip these, and they are ignored here.

is for use by SharpEye. is the number of pages in the score. ------------------------------------------------------------------------ page {...} is page { width height origwidth origheight skewangle rowoffset coloffset imagefpath$ systems { nof system {...} ... system {...} } } is page width, is page height in output units. is image width, is image height in pixels. , , , are for mapping between output and input coordinates by SharpEye. is the input staff spacing units of 1024 per pixel, for the 'dominant' size of staff on the page. In general that means the most common size of staff, but don't count on that: since the scale is estimated before staves are found, it is even possible that it will find no staves of the dominant size. This number can be used together with unitsperstavespacing in the score structure to relate the dimensions in the output to the original image. is the file path of the input image. is the number of systems in the page. ------------------------------------------------------------------------ system {...} is system { top left width height staves { nof stave {...} ... stave {...} } slurs { nof slur {...} ... slur {...} } } is the distance between top of page and top of system. is the distance between left of page and left of system. is the width of the system, and is its height, from top of top stave to bottom of bottom stave. ------------------------------------------------------------------------ stave {...} is stave { top left width size voicessplit joinedtobelow bars { nof bar {...} ... bar {...} } lyriclines { nof lyricelement {...} ... lyricelement {...} } texts { nof text {...} ... text {...} } dynamics { nof dynamic {...} ... dynamic {...} } } is the distance between top of page and top of stave. is the distance between left of page and left of stave. is the width of the stave. In version 1 and 2 at least and will be identical to the values for the system. is the (vertical) size of the stave. It will normally be very close to 64 since the spacing between lines is 16 units. For a stave which is not the dominant size on the page, it may be bigger or smaller. Also see the 'spacing' field in the page structure. and are 'True' or 'False'. They are set by user and affect output from SharpEye. Version 2: dynamics are new. Version 2011: texts are new. ------------------------------------------------------------------------ bar {...} is bar { clefs { nof clef {...} ... clef {...} } keysigs { nof keysig {...} ... keysig {...} } timesig {...} chords { nof chord {...} ... chord {...} } barline {...} } Note that a 'bar' is a physical/graphical bar, not always a musical/logical bar. A bar ends at a barline, but that barline may be a double bar line, or a repeat sign and does not always mean the end of a musical bar. Note also that symbols in the bar are stored by type, and not left to right. They need to be sorted in order to make musical sense of them. The symbols have position information, and this can be used for sorting. ------------------------------------------------------------------------ clef {...} is clef { shape centre pitchposn

} is one of 'Treble' 'Bass' or 'Alto' (G clef, F clef, C clef). is the position relative to stave top-left of the centre of the clef.

is the 'pitch position' of the clef. It is in pitch units. For a standard treble clef it will be 2, bass clef -2, alto 0, tenor -2. Currently you won't see any other values. It doesn't make much odds for now, but the

value should be used in preference to the r value ready for the day when eg baritone clefs are recognised. Version 2: can now be 'TrebleUp8' 'TrebleDown8' as well as the above, meaning a treble clef with a little 8 top or bottom to indicate an octave shift up or down. Version 2:

can now be any of -4,-2,0,2,4 for Alto clefs. ------------------------------------------------------------------------ keysig {...} is keysig { key centre } is an integer in the range -7 to 7. Negative numbers count flats, and positive ones count sharps. is the position relative to stave top-left of the centre of the keysig. ------------------------------------------------------------------------ timesig {...} is timesig { showasalpha top bottom centre timeslice

} } is either 'True' or 'False'. If True it means the time signature is displayed as a single symbol for common or alla breve time. Otherwise it is displayed as two numbers. is the top number, and is the bottom. These values are valid even if is True (in which case they would be 2 and 2, or 4and 4. is the position relative to stave top-left of the centre of the timesig. ------------------------------------------------------------------------ chord {...} is chord { virtualstem stemup stemslash tuplettransform

tupletcount staccato tenuto pause

accent staccato_dr tenuto_dr pause_dr accent_dr naugdots nflags flagposn headend beam {...} notes { nof note {...} ... note {...} } } For version 2, tupletcount is replaced by tupletID Like NIFF's stem, and ENIGMA's entry, this structure represents chords, single notes, and rests. Single notes are regarded as 'degenerate' chords, and rests as silent chords. is 'True' or 'False'. If True it means there is no stem, ie the chord is a breve, semi-breve or rest. (it's redundant but convenient.) is 'True' or 'False'. If True it means the stem points up from the note(s).

is the multiplier applied to the time to deal with tuplets. It is 1/1 for most notes, and 2/3 for notes in triplets. is 0 for notes not in a tuplet, otherwise a count from 1. This is used when editing. Tuplets is an area needs reworking, and you should ignore this. (Version 1) is -1 for notes not in a tuplet, otherwise an integer >= 0 that uniquely identifies the tuplet within a bar. (Version 2) , ,

, are each 'True' or 'False', and signify the presence or absence of a staccato dot, a tenuto sign, a pause (fermata) sign, or an accent on the chord. It is likely that these will not be present in later versions if the value is False. There will either be "staccato True" or nothing. , , , are not yet implemented. They are the vertical offset of the centre of the expression mark from the chords flag position. They are therefore positive if the expression is below the flag. is the number of augmentation dots following the chord. It is 0,1,2 or 3. is the number of flags on a chord which is not a rest, or part of a beamed group. It is 1 for a quaver, 2 for a semi-quaver, etc. NB: This applies to grace notes as well as normal notes. An earlier version of this documentation said otherwise. Also note that flags on grace notes are not currently counted (August 2001) so this field will be 1 for all grace notes for the time being. is True or False. If True it means the stem has a slash, eg for acciaccatura. This field will always not be present so assume a default of false when reading. Stem slashes are not currently (August 2001) recognised by the engine. (New in version 2). is the position of the flag or beam end of the stem on this chord. In the case of a stemless note (rest, breve, semi-breve) the c value is still valid, and is the centre of the note, chord or rest. is the position of the head that is furthest from the flag or beam. It is in 'pitch' units, which means the midline of the stave is zero, with values going up towards the bottom. So the note B on a the midline of a treble stave is 0, C is -1, D is -2, etc. Note that there will always be at least one note in the note list, which has further information. ------------------------------------------------------------------------ beam {...} is beam { id nofnodes nofleft nofright } Like NIFF, beams are made of 'nodes' There is one node for each chord that the beam joins. is a integer which uniquely identifies the beam in the bar. is the number of chords joined by the beam. is the number of beam-parts that point left from the chord. is the number of beam-parts that point right from the chord. ------------------------------------------------------------------------ note {...} is note { shape staveoffset p

accid accid_dc normalside } is one of 'Breve' 'SBreve' 'Minim' 'Solid' 'Grace' 'BreveRest' 'SBreveRest' 'MinimRest' 'CrotchetRest' 'QuaverRest' 'SQuaverRest' 'DSQuaverRest' 'HDSQuaverRest' Version 2: Grace notes are new. is the stave offset of this notehead. It is usually zero, meaning that the notehead belongs to the same stave as the chord structure. However, when a chord or beamed group spans more than one stave, it is regarded as belonging logically to the uppermost stave on which it has any noteheads, and any noteheads which belong to staves below this will have a positive stave offset. The engine currently doesn't recognise multi-stave objects like this, so will be 0. Version 2: The engine does now recognise multi-stave objects, but it chops them up into single-stave objects, so remains at zero.

is the pitch position, using the same encoding as the field in the chord structure. It also gives the vertical position of a rest in the case where a chord structure is used for a rest. In this case,

is the position of the centre of the rest in most cases, but the top of a rest for semibreve rest, and the bottom for a minim rest. is one of 'None' 'Sharp' 'Flat' 'Natural' 'DoubleSharp' 'DoubleFlat' 'NaturalSharp' 'NaturalFlat' is the horizontal offset of the accidental if any. It is measured from the left edge of the notehead to the centre of the accidental. It is negative (unless a recognition error has occured.) is 'True' or 'False'. If False, it means that the head goes the wrong way, as used for chords with second intervals in them. ------------------------------------------------------------------------ barline {...} is barline { type leftlinex rightlinex trueend invented } is one of 'Single' 'Double' 'Leftrepeat' 'Rightrepeat' 'Backtobackrepeat' 'ThinThick' is the x-posn of the centre of the leftmost vertical line in the barline, relative to the stave left. is the x-posn of the centre of the rightmost vertical line in the barline, relative to the stave left. is 'True' or 'False'. If True, it means the barline was invented by the recognition engine. This sometimes happens at the end of a stave. ------------------------------------------------------------------------ slur {...} is slur { leftpt rightpt radius partner

} Slurs, ties, phrase marks, any other curves found are approximated by an arc, and assigned to a system, but not interpreted further by the OMR engine. The coordinates are relative to the system top left. is a signed value, a negative values means the slur is above the centre of the arc, ie it is like /^\, and a positive value means \_/. The absolute values of RAD is the radius. ------------------------------------------------------------------------ lyricline {...} is lyricline { abot height style elements { nof lyricelement {...} ... lyricelement {...} } } is the vertical coordinate of the line relative to the top of the stave. It is the postion of the baseline of the text, the bottom of an 'a', not a 'g'. is the height of the text. This is the point size of the text (height of 'Ăg'). Not properly implemented until version 2011 of this format (version 2.11 of SharpEye). If reading an earlier version than 2011, should ignore this and make up a default. 36 is about right, ie 2.25 stave spacings. From 2011 this is based on the scan. It is probably best to average these values for all the lyrics in the score. is new in v2011. It is the font style. 0 sanserif (Arial) plain 1 sanserif (Arial) bold 2 sanserif (Arial) italic 3 sanserif (Arial) bold italic 4 serif (Times) plain 5 serif (Times) bold 6 serif (Times) italic 7 serif (Times) bold italic 8 monospaced (Courier) plain 9 monospaced (Courier) bold 10 monospaced (Courier) italic 11 monospaced (Courier) bold italic ------------------------------------------------------------------------ lyricelement {...} is lyricelement { extender c0 c1 124 text$ midc bar symbol } is 'True' or 'False'. If true it means this lyricelement is an extender line like this_______ and the text$ field should be ignored. Currently, will always be False as extender lines are not recognised. , , are the left, right, and middle x-posns of the element. I intend using and for extender lines and for syllables. Currently you should only rely on . If is False, is the text of the syllable, in double quotes. Syllables with one or more hyphens following are represented by making the last character in the syllable a hyphen. ------------------------------------------------------------------------ text {...} is { abot c0 height style text$ } is the baseline of the text, relative to the top of the stave. is the left of the text, relative to the left of the stave. is the height of the text. is the font style. is the text itself. See lyricline for details of and . ------------------------------------------------------------------------ dynamic {...} is dynamic { type c c0 c1 r } is one of 'Hairpindim' 'Hairpincres' 'Dyn_ppp' Dyn_pp' 'Dyn_p' 'Dyn_mp' 'Dyn_mf' 'Dyn_f' Dyn_ff' Dyn_fff'. For Hairpindim, Hairpincres, is the vertical centre, is the left, is the right. For the others, is the baseline of the text, and is the horizontal centre. All positions are relative to the stave top left. } ------------------------------------------------------------------------ Hints on writing a reader ========================= This is what I do... ==================== All functions return an error status as a pointer to a string. NULL means no error, otherwise it is a tag, which can be looked up. Working from the bottom up: const char *readmstoken(char *name) { const char *err = 0; if (1 != fscanf(mscript_iofp, " %32s", name)) { if (ferror(mscript_iofp)) { err = "Xmsread"; } else { err = "Xmstoken"; } } return err; } readmstoken() reads the next token knowing it is not a string in double quotes. This will never be more than 31 chars. -------------------------------------------------------------------------- const char *readmsstring(char *value) { const char *err = 0; int c; BOOL done; insist(value); while ((c = fgetc(mscript_iofp)) == ' ') { } if (c != '\"') { err = "Xmsnodq"; } done = FALSE; while (!err && !done) { while ((c = fgetc(mscript_iofp)) != '\"') { *value++ = c; } c = fgetc(mscript_iofp); if (c != '\"') { ungetc(c, mscript_iofp); done = TRUE; } else { *value++ = c; } } if (ferror(mscript_iofp)) { err = "Xmsread"; } return err; } readmsstring() reads a string in double quotes, knowing that is what to expect. --------------------------------------------------------------------- const char *mscriptskipstring(void) { .... } Same as readmsstring(), except ignores what it reads. ----------------------------------------------------------------------- const char *mscriptskipvalue(const char *keyword) { const char *err = 0; char name[max_token_size]; if (keyword[strlen(keyword) - 1] == '$') { if (!err) { err = mscriptskipstring(); } } else { if (!err) { err = readmstoken(name); } if (0 != strcmp(name, "{")) { return err; /* simple , done */ } else /* complex value, recurse */ { if (!err) { err = readmstoken(name); } while (!err && 0 != strcmp(name, "}")) { if (name[strlen(name) - 1] == '$') { if (!err) { err = mscriptskipstring(); } } else { if (!err) { err = mscriptskipvalue(name); } } if (!err) { err = readmstoken(name); } } } } return err; } mscriptskipvalue() is called when an unknown token is found. ----------------------------------------------------------------- const char *readmsstructstart(void) { const char *err = 0; char token[max_token_size]; if (!err) { err = readmstoken(token); } if (!err && 0 != strcmp(token, "{")) { err = "Xmssyntax"; } return err; } When a known is encountered, which is follwed by a complex , readmsstructstart() checks the '{' and reads past it. ----------------------------------------------------------------- const char *readmsstartarray(int *n) { const char *err = 0; char token[max_token_size]; int x = 0; if (!err) { err = readmstoken(token); } if (!err && 0 != strcmp(token, "{")) { err = "Xmssyntax"; } if (!err) { err = readmstoken(token); } if (!err && 0 != strcmp(token, "nof")) { err = "Xmssyntax"; } if (!err) { err = readmstoken(token); } x = atoi(token); if (!err && x <= 0) { err = "Xmssyntax"; } *n = x; return err; } When a known is encountered, which is followed by an array (list), readmsstartarray() checks the start, returns the size in *n --------------------------------------------------------------------- const char *readmsstruct(initialisestructFunc initf, readtokeninstructFunc f, void *p) { const char *err = 0; char token[max_token_size]; initf(p); err = readmsstructstart(); if (!err) { err = readmstoken(token); } while (!err && strcmp(token, "}") != 0) { if (!err) { err = (*f)(p, token); } if (!err) { err = readmstoken(token); } } return err; } This reads the inside of an arbitary structure. It calls initf() to intialise the structure, and f() to read each , passing the token to f(). --------------------------------------------------------------------- const char *readmsbeam(void *p, const char *token) { const char *err = 0; MScriptBeamNode *mbm = (MScriptBeamNode *)p; if (0 == strcmp("id", token)) { if (!err) { err = readmsint(&mbm->ID); } } else if (0 == strcmp("nofnodes", token)) { if (!err) { err = readmsint(&mbm->nofnodes); } } else if (0 == strcmp("nofleft", token)) { if (!err) { err = readmsint(&mbm->nofleft); } } else if (0 == strcmp("nofright", token)) { if (!err) { err = readmsint(&mbm->nofright); } } else { if (!err) { err = mscriptskipvalue(token); } /* skip unknown fields. */ } return err; } readmsbeam() is a typical function that is called by readmsstruct(). readmsint() reads an integer value. -------------------------------------------------------------------------- Reading an array is very similar to reading a structure, but a bit more complex, and you have to allocate memory. ------------------------------------------------------------------------ This is a list of strings to represent types and shapes of things. boolean type True False accidental type accid None Sharp Flat Natural DoubleSharp DoubleFlat NaturalSharp NaturalFlat note/rest type shape Breve Sbreve Minim Solid Breverest Sbreverest Minimrest Crotchetrest Quaverrest Squaverrest DSquaverrest HDSquaverrest clef type shape Treble Bass Alto barline type shape Single Double Leftrepeat Rightrepeat Backtobackrepeat ------------------------------------------------------------------------------