grammar = require './grammar'
KEYWORDS = grammar.KEYWORDS
RVALUE_OK = grammar.RVALUE_OKThis module applies a couple of “syntactic sugar” pre-processing steps to Kal code before it goes to the compiler. These steps would be onerous to do during the parsing stage, but are generally easier to do on a token stream. Each function in this module takes an input token stream and returns a new, possibly modified one.
Some sugar functions use the keyword list from the grammar, most notable the implicit parentheses for function calls.
grammar = require './grammar'
KEYWORDS = grammar.KEYWORDS
RVALUE_OK = grammar.RVALUE_OKThe entry point for this module is the translate_sugar function, which takes an input token stream and returns a modified token stream for use with the parser. It also takes an optional options parameter which may contain the following properties:
The function also takes a tokenizer argument which is a function that given a code string, returns an array with the first element being a token array and the second a comment token array. The Kal compiler uses the tokenize funtion in the lexer module for this argument. tokenizer, if present, is used to tokenize code embedded in double-quoted strings. If this argument is missing, double-quoted strings with embedded code blocks will be left as strings.
function translate_sugar (tokens, options, tokenizer)The current sugar stages are:
"1 + 1 = #{1 + 1}"), this function tokenizes the code blocks and converts the string to the equivalent of "1 + 1 = " + (1 + 1).my_function 1, 2.print to console.log(a,b) -> return a + b) to standard Kal function syntax.The output is a new token stream (array).
out_tokens = coffee_style_functions print_statement noparen_function_calls multiline_statements multiline_lists clean code_in_strings tokens, tokenizerDebug printing of the token stream is enabled with the show_tokens option.
if options?.show_tokens
debug = []
for t in out_tokens
if t.type is 'NEWLINE'
debug.push '\n'
else
debug.push t.value or t.type
console.log debug.join ' '
return out_tokens
exports.translate_sugar = translate_sugarThis function allows support for double-quoted strings with embedded code, like: “x is #{x}”. It uses the tokenizer argument (a function that converts a code string into a token array, like lexer.tokenize) to run the code blocks in the string through the lexer. The return value is the merged stream of tokens.
function code_in_strings (tokens, tokenizer)We abort if there is no tokenizer provided and just don't translate the strings.
return tokens when tokenizer doesnt existThe output is a new token array (we don't modify the original).
out_tokens = []
for token in tokensFor double-quoted strings, we search for code blocks like "#{code}". The regex uses the non-greedy operator to avoid parsing "#{block1} #{block2}" as a single block.
if token.type is 'STRING' and token.value[0] is '"'
rv = token.value
r = /#{.*?}/g
m = r.exec rvWe generally must add parentheses around any string that gets broken up for code blocks (and it is always safe to do so). soft indicates that this was added by the sugar module, not the user. It's passed forward to no-paren function calls.
add_parens = yes if m otherwise no
out_tokens.push({text:'(', line:token.line, value:'(', type:'LITERAL', soft:yes}) when add_parensFor each code block match, we first add a string token to the stream for all the constant text before the block start, then a +.
while m
new_token_text = rv.slice(0,m.index) + '"'
out_tokens.push {text:new_token_text, line:token.line, value:new_token_text, type:'STRING'}
out_tokens.push {text:'+', line:token.line, value:'+', type:'LITERAL'}Next we add the parsed version of the code block (a token array) generated by running the code through the lexer. If there is more than one token, this also needs to be in parentheses.
new_tokens = tokenizer(rv.slice(m.index+2,m.index+m[0].length-1))[0]
out_tokens.push({text:'(', line:token.line, value:'(', type:'LITERAL'}) when new_tokens.length isnt 1
out_tokens = out_tokens.concat new_tokens
out_tokens.push({text:')', line:token.line, value:')', type:'LITERAL'}) when new_tokens.length isnt 1Next we make a string out of any remaining text after the block in case this is the last match. If the loop exits here, it gets added to the token stream, otherwise we ignore it since the next iteration will take care of it. If the string is the empty string, we set it to blank since we don't want things like "a is #{a}" turning into ("a is " + a + "") for asthetic reasons.
rv = '"' + rv.slice(m.index+m[0].length)
if rv is '""'
rv = ''
else
out_tokens.push {text:'+', line:token.line, value:'+', type:'LITERAL'}Find the next code block if there is one.
r = /#{.*?}/g
m = r.exec rvIf there wasn't a next code block, add the remaining string (if any) and close paren.
out_tokens.push({text:rv, line:token.line, value:rv, type:'STRING'}) when rv isnt ''
out_tokens.push({text:')', line:token.line, value:')', type:'LITERAL', soft:yes}) when add_parens
elseFor anything other than a double-quoted string, just pass it through.
out_tokens.push token
return out_tokensRemoves whitespace. It marks tokens that were followed by whitespace so that the later stages can detect the difference between things like my_function(a) -> and my_function (a) ->.
function clean (tokens)
out_tokens = []
for token in tokens
if token.type isnt 'WHITESPACE'
out_tokens.push token
else if out_tokens.length > 0
out_tokens[out_tokens.length - 1].trailed_by_white = yes
return out_tokensThis function removes newlines and indentation after commas, allowing long lines of code to be broken up into multiple lines. Token line numbers are preserved for error reporting.
function multiline_statements (tokens)
out_tokens = []
last_token = nullWe keep track of whether or not we are on a continued line and how many indents we ignored.
continue_line = no
reduce_dedent = 0
for token in tokens
skip_token = noIf we see a newline after a comma, remove it from the stream and mark that we are in line continuation mode.
if last_token?.value in [','] and token.type is 'NEWLINE'
continue_line = yes
skip_token = yesIn line continuation mode, ignore indents and dedents, but keep track of them. We exit line continuation mode when we see a DEDENT that brings back to even with the original line.
else if continue_line
if token.type is 'INDENT'
skip_token = yes
reduce_dedent += 1
else if token.type is 'NEWLINE'
skip_token = yes
else if token.type is 'DEDENT'
if reduce_dedent > 0
reduce_dedent -= 1
skip_token = yes
if reduce_dedent is 0
out_tokens.push {text:'\n', line:token.line, value:'',type:'NEWLINE'}
elseWhen exiting line continuation mode, we have to add back in the last NEWLINE.
out_tokens.push last_tokenAdd the token to the new stream unless we decided to skip it.
out_tokens.push(token) unless skip_token
last_token = token
return out_tokensThis stage converts implicit function calls (my_function a, b) to explicit ones (my_function(a,b)). NOPAREN_WORDS specify keywords that should not be considered as a first argument to a function call. For example, we don't want x is a to turn into x(is(a)), but we do want x y z to become x(y(z)).
NOPAREN_WORDS = ['is','otherwise','except','else','doesnt','exist','exists','isnt','inherits',
'from','and','or','xor','in','when','instanceof','of','nor','if','unless',
'except','for','with','wait','task','fail','parallel','series','safe','but',
'bitwise','mod','second','seconds','while','until']This function is admittedly messy and in need of a rewrite. But it's not broken, so…
function noparen_function_calls (tokens)
out_tokens = []
close_paren_count = 0
last_token = null
triggers = []
closures = []
ignore_next_indent = noWe need a token counter because sometimes we look back two or three tokens.
i = 0
while i < tokens.length
token = tokens[i]Check that the previous token is not a reserved word. This can happen if the last token is not a keyword, two tokens ago was a . (like x.for a), or the last token is a keyword but a valid r-value (me x).
last_token_isnt_reserved = not (last_token?.value in KEYWORDS) or tokens[i-2]?.value is '.' or (last_token?.value in RVALUE_OK)Check if the previous token was callable. This is only true if it is an IDENTIFIER (not reserved) or a ] like x[1] a.
last_token_callable = (last_token?.type is 'IDENTIFIER' and last_token_isnt_reserved) or last_token?.value is ']'Check that the current token isn't a no-paren word (not looking at something like x for).
token_isnt_reserved = not (token.value in NOPAREN_WORDS)Check that the current token is not a literal (don't want my_function * 2 to become my_function(* 2)).
non_literal = (token.type in ['IDENTIFIER','NUMBER','STRING','REGEX'])There are some exceptions for callable literals, for things like f {x:1}, f [1], and ->.
callable_literal = (token.value is '{' or (token.value is '[' and last_token?.trailed_by_white) or (token.value is '-' and tokens[i+1]?.value is '>'))Combining previous checks, we check that this token is not an operator.
this_token_not_operator = ((non_literal or callable_literal) and token_isnt_reserved)Check if this is a function declaration.
declaring_a_function = tokens[i-2]?.value in ['function','task','method','class'] and last_token?.type is 'IDENTIFIER'Check if a parenthesis is soft, meaning added by the sugar and not the user.
soft_paren = (token.value is '(' and token.soft but not declaring_a_function)Don't want to add parentheses around bitwise left or bitwise right, but we also really don't want left and right to be no-paren words, otherwise x left would not translate to x(left). These are really useful words, so we handle them in this special case to avoid this issue.
bitwise_shift = (last_token?.value in ['left','right']) and tokens[i-2]?.value is 'bitwise'If the previous token is callable and the current token is not an operator (or it‘s a parenthesis that the user didn’t add) and we're not in the special bitwise case, then we add an open paren. We add a trigger to close the parentheses on the next NEWLINE.
if last_token_callable and (this_token_not_operator or soft_paren) but not bitwise_shift
triggers.push 'NEWLINE'
out_tokens.push {text:'(', line:token.line, value:'(', type:'LITERAL'}
closures.push ')'If we're passing a function as an argument, we want to change the close trigger to a DEDENT and ignore the next INDENT.
else if (token.value is 'function' or (token.value is '>' and last_token?.value is '-')) and triggers[triggers.length-1] is 'NEWLINE'
triggers[triggers.length-1] = 'DEDENT'
ignore_next_indent = yesKeep track of indents so that streams like: x = myfunct function () NEWLINE INDENT ... DEDENT will not close out parentheses early.
else if token.type is 'INDENT'
if ignore_next_indent
ignore_next_indent = no
else
triggers.push 'DEDENT'
closures.push ''Reset the ignore_next_indent flag if necessary.
else if token.type is 'NEWLINE' and tokens[i+1]?.type isnt 'INDENT'
ignore_next_indent = noCheck if we hit a “closure” (end of implied parentheses) when we are looking for a NEWLINE. This can happen on an actual NEWLINE or when we hit a tail conditional.
if (token.type is 'NEWLINE' or token.value in ['if','unless','when','except']) and closures.length > 0 and triggers[triggers.length - 1] is 'NEWLINE'If so, pop all NEWLINE closures and add in the implied tokens. NEWLINEs can close out multiple parentheses (x = a b c).
while closures.length > 0 and triggers[triggers.length - 1] is 'NEWLINE'
triggers.pop()
closure = closures.pop()
out_tokens.push({text:closure, line:token.line, value:closure, type:'LITERAL'}) if closure isnt ''
out_tokens.push tokenIf our closure had a DEDENT trigger, pop it and add the token.
else if token.type is 'DEDENT' and closures.length > 0 and triggers[triggers.length - 1] is 'DEDENT'
out_tokens.push token
triggers.pop()
closure = closures.pop()
out_tokens.push({text:closure, line:token.line, value:closure, type:'LITERAL'}) if closure isnt ''If no trigger was matched, just pass the token through.
else if closures.length is 0 or token.type isnt triggers[triggers.length - 1]
out_tokens.push token
last_token = token
i += 1If we hit EOF, pop out all the remaning closures.
while closures.length > 0
closure = closures.pop()
out_tokens.push({text:closure, line:token.line, value:closure, type:'LITERAL'}) if closure isnt ''
return out_tokensThis function converts CoffeeScript-style functions (() ->) to Kal syntax.
function coffee_style_functions (tokens)
out_tokens = []
last_token = nullWe need to track the token index since we look back several tokens in this stage.
i = 0
while i < tokens.length
token = tokens[i]Look for a ->.
if last_token?.value is '-' and token?.value is '>'If we see the ->, that means the current token is > and we already added the - to the new stream. We have to pop the - off the stream.
out_tokens.pop()We create a new token stream fragment for this function header.
new_tokens = []Next we examine the last token in the stream. Since we just popped the -, this will either be a ) if the definition is in the form (args) -> or something else if it doesn't specify arguments.
t = out_tokens.pop()
if t?.value is ')'If there are arguments here, keep popping until we hit the (, adding the argument tokens to the new_tokens stream. At the end of this loop, new_tokens will be the arguments passed (if any) without enclosing parens.
while t?.value isnt '('
new_tokens.unshift t
t = out_tokens.pop()Pass the closing paren.
new_tokens.unshift t
elseIf no arguments were specified, let new_tokens be ()
out_tokens.push t
new_tokens.push {text:'(', line:token.line, value:'(', type:'LITERAL'}
new_tokens.push {text:')', line:token.line, value:')', type:'LITERAL'}Prepend the function token to new_tokens, which currently has the arguments (if any) in parentheses. Then add it to the out_tokens stream.
f_token = {text:'function', line:token.line, value:'function', type:'IDENTIFIER'}
new_tokens.unshift f_token
out_tokens = out_tokens.concat new_tokens
elseIf we're not handling a Coffee-Style function, just pass tokens through.
out_tokens.push token
last_token = token
i += 1
return out_tokensThis function converts list definitions that span multiple lines into a single line. Tokens retain their original line numbers. This supports lists and explicit map definitions ({}).
This function is admittedly awful and needs rework.
function multiline_lists (tokens)
out_tokens = []We need to track nested lists.
list_depth = 0
last_token_was_separator = no
indent_depths = []
indent_depth = 0
leftover_indent = 0
for token in tokens
skip_this_token = noWe need to keep track of whether or not this token is eligible as a list item separator.
token_is_separator = (token.type in ['NEWLINE','INDENT', 'DEDENT'] or token.value is ',')When we see a list start, we push to the list stack.
if token.value is '[' or token.value is '{'
list_depth += 1
indent_depths.push indent_depth
indent_depth = 0Likewise for a list end, we pop the stack.
else if token.value is ']' or token.value is '}'
list_depth -= 1
leftover_indent = indent_depth
indent_depth = indent_depths.pop()Keep track of the indentation level, looking for a token that returns us to the original indent. We continue to skip indents/dedents until this happens. Basically, we want to ignore indentation inside these multi-line definitions. Once back to original the indent level, we push in a NEWLINE.
Note that none of this happens unless we are inside a list definition (all these flags are ignored).
else if token.type is 'INDENT'
indent_depth += 1
if leftover_indent isnt 0
leftover_indent += 1
skip_this_token = yes
out_tokens.push({text:'', line:token.line, value:'\n', type:'NEWLINE'}) if leftover_indent is 0
else if token.type is 'DEDENT'
indent_depth -= 1
if leftover_indent isnt 0
leftover_indent -= 1
out_tokens.push({text:'', line:token.line, value:'\n', type:'NEWLINE'}) if leftover_indent is 0
skip_this_token = yesSkip newlines inside of list definitions.
else if token.type is 'NEWLINE'
if leftover_indent isnt 0
skip_this_token = yes
else
leftover_indent = 0
if list_depth > 0The first token in a newline stretch gets turned into a comma
if token_is_separator and not last_token_was_separator
out_tokens.push {text:',', line:token.line, value:',', type:'LITERAL'}
else
out_tokens.push token unless token_is_separator or skip_this_token
else
out_tokens.push token unless skip_this_token
last_token_was_separator = token_is_separator and (list_depth > 0)
return out_tokensConvert print tokens to console . log tokens.
function print_statement (tokens)
new_tokens = []
for token in tokens
if token.value is 'print' and token.type is 'IDENTIFIER'
new_tokens.push {text:'print', line:token.line, value:'console', type:'IDENTIFIER'}
new_tokens.push {text:'print', line:token.line, value:'.', type:'LITERAL'}
new_tokens.push {text:'print', line:token.line, value:'log', type:'IDENTIFIER'}
else
new_tokens.push token
return new_tokens