Friday, May 19, 2006

Internal Type Theory

In a previous paper I introduced a general notion of simultaneous inductive-recursive definition in intuitionistic type theory. This notion subsumes various reflection principles and seems to pave the way for a natural development of what could be called ``internal type theory'', that is, the construction of models of (fragments of) type theory in type theory, and more generally, the formalization of the metatheory of type theory in type theory.
The present paper is a first investigation of such an internal type theory.

I am currently reading this paper (when not working on my day job), so the interpreter has to wait.

Tuesday, May 09, 2006

Open scheduler. Or open language?

I just wanted to add yet another feature to PAL - fairness for units bigger than a single continuation.
Round-robin for continuations is simple, but it does not take into account the creators of continuations (cause/effect tree). As always, it is possible to construct a use case where right scheduling means not only difference in performance, but also in termination; but I will not do that.
Basically, PAL needs a construct to give a programmer control over dividing "CPU time" between sets of continuations. This could be done, for example, by a statement (SplitCPU PerCent Statement Statement) that reduces two statements in parallel by allocating PerCent "CPU cycles" to the first one, and 100-PerCent to the second one. These statements compose with other statements and each other in an obvious way (children sharing percantage of their parent, not the whole 100%).
This looks cool, but what if I want to introduce more constructs? Already I need cancellation of a branch and scheduling of non-CPU resources ("oracles", more on this later). And I see no easy way to implement these constructs via compilation to PAL.
At this moment I decided - why not make PAL user-extendable? It already supports user-extended terms, I plan to support user-provided scheduler, then why not user-provided statements (op-codes)?
In fact, the whole PAL can be just a composition of a set of "features", where each feature is a set of statements, terms, and scheduling rules. The current PAL can be split into three features - promises, cells, and atomic execution.
I already compose terms using a fixpoint of multiple term functors (actually I mix one functor into a base term, but maybe there ways to do that for any number of functors), I hope the same can be done for statements, but composition of schedulers (reduction strategies) looks a bit less clear at the moment.
If this indeed works, I believe it's time to either create a SourceForge project, or write a paper, or both :)

PS: I have to remember to document my thoughts on efficient implementation of the execution trace tree in the light of composable features.

Monday, May 08, 2006

Plain interpreter. No, compiler.

I was a bit ill during the weekend, so instead of going to seashore played with Haskell.
I shot for an interpreter of Plain in Haskell, but ended with a compiler from Plain to a STM-like intermediate language (promises and atomic execution - let's call it PAL) and an interpreter for PAL in Haskell (let's call it PVM). This looks promising (pun intended).

What I need to do before going further is:

  1. Fix implementation of variables in PVM (this is currently a strange hybrid of de Bruijn indices and dynamic scoping).
  2. Express PAL statements in monadic form to simplify writing compiler(s).
  3. Express PVM in monadic form to simplify experimenting with it.
  4. Externalize scheduling (probably in a way of A Language-Based Approach to Unifying Events and Threads). Amusingly, an atomic execution could be elegantly expressed as a strict node in the trace tree.
  5. Explore possibilities for a type system for PAL (A Functional Abstraction of Typed Contexts?).

Friday, May 05, 2006

Typed Concurrent Programming with Logic Variables

We present a concurrent higher-order programming language called Plain and a
concomitant static type system. Plain is based on logic variables and computes
with possibly partial data structures. The data structures of Plain are procedures, cells, and records. Plain's type system features record-based subtyping, bounded existential polymorphism, and access modalities distinguishing between reading and writing.

You may want to compare this with The Oz Programming Model (OPM), which

... is a concurrent programming model subsuming higher-order functional and object-oriented programming as facets of a general model. This is particularly interesting for concurrent object-oriented programming, for which no comprehensive formal model existed until now. The model can be extended so that it can express encapsulated problem solvers generalizing the problem solving capabilities of constraint logic programming.

Another paper on OPM is The Operational Semantics of Oz.

In short, the model of Plain is based on that of Oz with the main differences being:

  1. Plain statically types programs using a type system with subtyping, while Oz is latently typed.
  2. Therefore Plain chooses to drop support for unification in favor of a single-assignment operation.

Wednesday, May 03, 2006

Logical variables?

After all, it looks like old good logical variables may be the best solution for a communication between objects.
See, for example, An introduction to AKL (a precursor of Oz).

Typed Logical Variables in Haskell looks like a good introduction to typed logical variables in Haskell (hey, that's written in the title :) ).


Well, STM's retry is so to say pessimistic - it restarts the whole transaction (up to "atomically" or "orElse") instead of backtracking a single step (one "bind").

Using (simple) backtracking can be inefficient. As an example, consider again a binary join, which succeeds if both its inputs succeed:

join [([], []), ([1], []), ([1], [2]), ([1, 3], [2, 4])] = [([1],[2]),([1],[2]),([1],[4]),([3],[2]),([3],[4])]

It can be implemented for any MonadPlus, but there is some inefficiency there:

join :: MonadPlus m => m (m a, m b) -> m (m a, m b)
join sab = do
(sa, sb) <- sab
a <- sa
b <- sb -- failure on this unneccesary retries sa, should retry sab
return (return a, return b)

What's worse, there is also a difference in semantics. Consider a shocking example, which fails to terminate:

join [(repeat 1, []), (repeat 1, [2])] = ⊥

This would produce a single answer, and THEN fail to terminate, if it retried sab after failing to read sb:

join [(repeat 1, []), (repeat 1, [2])] = [(1, 2), ⊥]

Note that I cannot fix join by simple moving b <- sb before a <- sa, as they are fully symmetric. What I need is a combinator, which unlike bind would execute actions in parallel, and fail AS SOON as any of them fails. Let's say I need a parallel and - pand. Can I code it in terms of MonadPlus? Let us see...
Aha, a quick googling got me A Monadic Semantics for Core Curry.

Software transactional memory = optimistic transaction isolation + backtracking?

While playing with STM, it occurred to me that it actually combines two features: optimistic transaction isolation (OTI), which allows transactions to execute seemingly independently of each other, and backtracking (BT, implemented via retry/orElse/bind in place of fail/or/and), which ensures atomicity.
If one takes care when designing a concurrent language, then OTI becomes unnecessary. For that, it is sufficient to have:
NDW: non-destructive writes - the source of communication cannot undo previous actions, it can only "append" to them.
LRD: localised read and destroy - access to read/destroy operations on communication media is tightly controlled and in some sense local (like a single thread).
MB: monotonic behaviors - behavior cannot fail on an input bigger than one which made it succeed.
Join calculus enjoys all these properties. Indeed, NDW is ensured by writes appending to a multiset of messages, LRD is ensured syntactically by coupling creation of channels with specifying reactions to them, and MB is trivial as well - join calculus behaviors can only fail (which means - block) because of lack of a message, not because of an extra message.

Why do I think that NDW+LRD+MB obviate OTI?
Because MB ensures it's ok to schedule producer a bit sooner, BT ensures it is ok to schedule consumer a bit sooner, and NDW+LRD+MB ensure it is ok to run producer and consumer in parallel.
One still has to be careful at a very low level of operations - e.g., the multi-processor implementation must ensure atomicity of single writes and reads, but this is easier then full OTI.

If I am right with this conjecture, it would be cool to try and split STM into two facets, OTI and BT, and see, what happens.

BTW, isn't BT just a fancy name for Control.Monad.List? :)

Tuesday, May 02, 2006