Monday, April 29, 2013

An algebraic approach for data-centric scientific workflows

An algebraic approach for data-centric scientific workflows
E. Ogasawara, D. de Oliveira, P. Vanduriez, J. Dias, F. Porto, M. Mattoso
VLDB 2011

This paper argues for the use of an algebraic core language to support parallel execution of data-centric workflows, i.e., graphs that correspond to programs over bulk data.  The idea is that just as relational algebra is used inside databases, and can be evaluated in different ways (using differeny physical operators) or optimized (using equivalences derived from the semantics, and profiling information/statistics driving cost estimates), the algebraic operations presented here can be used to support different execution models or optimizations.

The operators include "Map", "Reduce", "Filter", and a variant of "Map" called "SplitMap"; all of these can take an arbitrary executable and run it on many inputs.  SplitMap has some additional grouping / splitting behavior that isn't explained in detail in the paper.  There are also two relational operators, SRQuery, which applies a selection/projection query to a single relation, and JoinQuery which applies a multiple-input query to several relations.  The connections between operators are typed as tuples of base values or filenames (or possibly other nested values, but this isn't discussed further.)  So this can be viewed as a generalization of the relational calculus, where some nodes of the graph correspond to whole queries, and other nodes correspond to structured user-defined operations.

Read more »

Labels: ,

Friday, April 26, 2013

From State- to Delta-Based Bidirectional Model Transformations: the Asymmetric Case

From State- to Delta-Based Bidirectional Model Transformations: the Asymmetric Case
Zinovy Diskin, Yingfei Xiong, and Krzysztof Czarnecki
Journal of Object Technology 10(6):1-25, 2011

This paper is the journal version of an ICMT 2010 paper, which I had read a few years ago.  The journal version is sufficiently different to make it worthwhile to read if you'd already read the conference version; I had particular trouble with the small figures/examples in the conference version, and this one is much better in that respect.

The idea is to revisit the notion of "lens" or (asymmetric) bidirectional transformation, introduced in now-classic work by Foster et al. (POPL 2005, TOPLAS 2007).  Basically, a lens is a pair of functions

get &:& \AA \to \BB\\
put &:& \AA\times \BB \to \AA

where the idea is, $get$ maps some source data $A \in \AA$ to a "view" $B \in \BB$, and $put$ takes a (possibly updated) $B' \in \BB$ and the original $A \in \AA$ and produces a corresponding $A' \in \AA$.  These are expected to satisfy some basic consistency laws.

Read more »

Labels: ,

Thursday, April 25, 2013

mathjax test

This is a test.  This is only a test.

\[\lambda f.~(\lambda x.~f~(x~x))~(\lambda x.~f~(x~x))\]

This has been a test of the emergency combinator broadcast system.  In the event of a real emergency,  you would have received further instructions.

I'm going to try using MathJax to embed math into posts (hopefully comments also).  This apparently interacts badly with modern layout/template technology, so I'll be sticking with old-school HTML+CSS, which Blogger supports if you ask nicely.

This raises an interesting question, namely will the posts still be readable over time as MathJax and/or Blogger evolves, which I'm not going to think about too hard.  The right way to do it would probably be for Blogger to call out to MathJax to render the LaTeX to HTML+CSS or MathML just once, when the post is saved, and preserve it in HTML5 (which now includes MathML). 


This is a research blog.  I have fairly wide-ranging research interests, loosely clustered around the topics of programming languages, logic, databases, and verification, with a dash of security.  (The word dilletante comes to mind...)  The purpose of this blog is to collect thoughts about papers, systems, or recent developments involving interesting interaction between programming languages and other systems, with particular interest in databases. I've fallen out of the habit of keeping notes on my reading, and have always wished my notes were searchable anyway.  So, the purpose of this blog is to collect the thoughts that are otherwise falling on the floor, mainly as a resource for myself and possibly some colleagues, but I am making this public on the off chance that one or two other people might find some of it interesting.

The way this will work is from now on when I read a (publicly available) paper I'll blog about it - at least a one-liner with an instant reaction, a deeper review/reaction if time permits.  From time to time there might be reflective or "review" posts that collect thoughts about a larger topic or several papers on a topic.  I'm not going to start at the beginning; that is, for the time being, I'm not rereading papers I've already read just to provide context, though this will likely happen indirectly over the next few months because of new students or projects.  I am also not sure how I feel about comments; I'll leave them on unless their management becomes a problem.