martian-computing

CS 498MC Martian Computing at the University of Illinois at Urbana–Champaign

View the Project on GitHub davis68/martian-computing

Text Parsing

Learning Objectives

Text Molds and Values

The basic text quantities in Hoon are the atom cord @t and derivatives, and the list tape.

> `@t`'Excalibur'
'Excalibur'
> `tape`"Excalibur"
"Excalibur"

There are many helpful text conversion arms:

You can convert to and from atoms with particular auras as well:

But wait: what’s a dime? A dime is a pair of aura as @ta and a value. This helps the function know what to render the value as.

++scan is used with a parsing rule to parse tape into atom.

There are many more of these, but you get the flavor of it.

Text Structures

Many more advanced text structures are available. These contain metadata tags such as %leaf to hint to the prettyprinter and other tools how to represent and process the text data.

Remember from way back when wain and wall: wain being a list of cords and wall being a list of tapes.

The primary data structure used by the prettyprinter is a tank, which is a nested tagged structure of tapes. A tank element can be tagged in one of three ways:

For instance, to make a single %leaf statement, you can type

leaf+"Rhongomyniad"

You can also us the >1.000< format, which converts a value to a $tank directly (and can be used with faces/names).

A tang is a list of tanks. It is used with ++slog to produce conditional error messages.

Tokenizing and Parsing Text

One of the most straightforward tools to use with text is ++trim, which splits a tape into two parts at a given index. You can use this together with ++find to produce a simple text tokenizer.

++  parse
  |=  [in-words=tape]
  =/  out-words  *(list tape)
  |-  ^+  out-words
  =/  next-index  (find " " in-words)
  ?:  =(next-index ~)  (weld out-words ~[in-words])
  =/  values  (trim +(+:next-index) in-words)
  ~&  values
  $(in-words +:values, out-words (weld out-words ~[-:values]))

Sail & XML Parsing

Let’s consider abstractly manipulating XML entities. There are a number of ; mic runes which support this.

Sail is a part of Hoon used for creating and operating on nouns that represent XML nodes. With the appropriate rendering pipeline, a Sail document can be used to generate a static website.

The Sail runes are stored as ; mic macros. These operate on ++manx and ++marl (list of manx) values. A manx is a single XML node; XML being a superspec of HTML, therefore, Sail can be used to map and produce HTML as a function of Hoon operations.

A manx is a single XML node, and thus

[[%p ~] [[%$ [%$ "This is the first node."] ~] ~] ~]

Generally, one produces and manipulates marls rather than directly working with manxs.

;p: This will be rendered as an XML node.
=;  ;p:
    ;p:
    ;p:

These are ultimately parsed by and from %zuse’s ++en-xml:html and ++de-xml:html arms.

`manx`+:(de-xml:html (crip "<element attribute=\"1\">text<!-- comment --></element>"))

JSON Parsing

Similar to XML/HTML, there are a number of tools (but not runes) in the ++enjs and ++dejs arms of %zuse.

> (tape:enjs:format "hello world")
[%s p='hello world']
> (sa:dejs:format (tape:enjs:format "hello world"))
"hello world"

Aside on Functional Tools

It is convenient when parsing (and performing many other operations) to curry a function or cork it.

To curry a function means to wrap one of its arguments inside of it so that it becomes a function not of $n$ variables but of $n-1$ variables. Use ++cury to accomplish this.

> =add-1 (add:rs .1)
> (add-1 .2)
.3

To cork a function is to compose it forwards; that is, repeatedly apply it

> (:(cork dec dec dec dec dec) 1.000)
995

Use ++cork to cork a function.

Art by Chris Foss.

Questions

Parsing HTML

Parse the following HTML block into Sail elements such as manxs and marls:

<p>The <a href="/wiki/Alliterative_Morte_Arthure" title="Alliterative Morte Arthure">Alliterative <i>Morte Arthure</i></a>, a <a href="/wiki/Middle_English" title="Middle English">Middle English</a> poem, mentions Clarent, a sword of peace meant for knighting and ceremonies as opposed to battle, which <a href="/wiki/Mordred" title="Mordred">Mordred</a> stole and then used to kill Arthur at Camlann. The Prose <i>Lancelot</i> of the Vulgate Cycle mentions a sword called Seure (Sequence), or Secace in some manuscripts, which belonged to Arthur but was borrowed by Lancelot.</p>

(You’ll need to right-click and Inspect Element to get the <p> tag and its contents. Markdown and Jinja aren’t playing nice with the code block.)

Morse Code

Produce an %ask generator which accepts a text value and produces the Morse code equivalent. You may use the following core as a point of departure for composition.

You may decide how to handle spaces (omit or emit a space), but you should convert the message to lower-case first.

|%
++  en-morse  !!
++  table
  %-  my
  :~  :-  'a'  '·-'     :-  'b'  '-···'   :-  'c'  '-·-·'   :-  'd'  '-··'
      :-  'e'  '·'      :-  'f'  '··-·'   :-  'g'  '--·'    :-  'h'  '····'
      :-  'i'  '··'     :-  'j'  '·---'   :-  'k'  '-·-'    :-  'l'  '·-··'
      :-  'm'  '--'     :-  'n'  '-·'     :-  'o'  '---'    :-  'p'  '·--·'
      :-  'q'  '--·-'   :-  'r'  '·-·'    :-  's'  '···'    :-  't'  '-'
      :-  'u'  '··-'    :-  'v'  '···-'   :-  'w'  '·--'    :-  'x'  '-··-'
      :-  'y'  '-·--'   :-  'z'  '--··'   :-  '0'  '-----'  :-  '1'  '·----'
      :-  '2'  '··---'  :-  '3'  '···--'  :-  '4'  '····-'  :-  '5'  '·····'
      :-  '6'  '-····'  :-  '7'  '--···'  :-  '8'  '---··'  :-  '9'  '----·'
  ==
--