NTFS Data Streams and .NET

Link. March 8, 2008. Comments [0]. Posted in: .NET

Several people have written in the past about accessing Alternate Data Streams in the NTFS file system from .NET code. The reason for this is that accessing streams is not natively supported in .NET. What you don't hear very often is exactly why this is so.

There are two things you might want to do with data streams:

  1. Manipulate them: create/read/write/delete them
  2. Enumerate them: list all the alternate data streams for a file

Neither of these operations are not supported by System.IO. To be perfectly honest, I think not supporting (2) is an understandable option; after all, it requires exposing some fairly specific windows APIs that are really only useful with NTFS itself (as far as I know). There's an old MSDN Magazine article on this topic by Stephen Toub, by the way.

But I was always pretty surprised that (1) wasn't natively supported. After all, you don't really need special APIs to do most of that stuff; it's built into the native Win32 APIs that .NET has to call anyway to perform the basic file and I/O operations. Heck, it is so basic that you can create a new alternate stream using "echo" in a cmd prompt!

You can find some mentions out there about this being because complexities introduced by file name aliases and what not, but I'm not sure I buy it. Seriously, number (1) should just work; if it doesn't then something smells wrong. After all, it takes work to get something that you get for free to not work at all!

I think the overall reason this doesn't happen is just that the way paths are handled (and path canonicalization is done) in regards to FileIOPermissions and such, but frankly, I'm not sure why it is such a big deal. There are certainly a bunch of other file system features in NT, which .NET doesn't supports either, but those are a bit more understandable (to me, at least).

Other than that, there's also the fact that System.IO.Path isn't particularly bright about how it handles paths. For example, it is hard to argue that this could be proper behavior:

path

Fortunately, using alternate data streams isn't so common, so this isn't such a big deal. Still, it is a curious bit.

Irony

Link. March 5, 2008. Comments [1]. Posted in: .NET

I've been playing a bit with Irony, an open source .NET compiler construction toolkit created by Roman Ivantsov. My interest in Irony was sparked after watching the video of Roman's presentation at Lang.NET 2008 (I'd link to it, but they are unavailable at the moment).

Currently, I have a little side project where I've been experimenting with the Dynamic Language Runtime, for which I was using GPPG and GPLEX to build the parser and tokenizer. The tools are OK, but it certainly requires quite a bit of manual work, you need to keep regenerating the code, and, frankly, the error messages both tools produce leave a bit to be desired.

I'm aware of ANTLR, but frankly don't want to mess around with the whole java generator thing at this time (already have way too much crap around).

Irony is pretty interesting because all the tokenizer and grammar rules are expressed directly in C# code as a simple object model built from Terminal and NontTerminal objects. The syntax is fairly intuitive because Irony overloads several operators like | and & to make the grammar definition look a lot similar to BNF. Here's a sample of what a simple expression grammar in Irony looks like: http://www.codeplex.com/irony/Wiki/View.aspx?title=Expression%20grammar%20sample

After testing a bit Irony, I'm very encouraged to give it a try on my pet project, so I've been playing around with my grammar (unfortunately broken at this time because of some ambiguities I haven't resolved yet), and its looking very nice overall.

There's also a useful "Grammar Explorer" tool included with the Irony code which can be used to experiment and diagnose a compiled Irony grammar. Unfortunately right now it just has a predefined list of a few test grammars included with the project, but extending it to use others is trivial anyway, so I've been using it to test my grammar before actually committing to it.

Another useful thing in Irony is the concept of TokenFilters, which are objects that can filter the token stream as it is produced by the scanner and can either remove/modify/add new tokens as necessary or provide extra validations. Right now Irony provides two built-in filters, but you can extend it with new ones as necessary:

  • CodeOutlineFilter, which can be used in languages where whitespace is significant (indentation and/or newlines).
  • BraceMatchingFilter for matching brace characters, for example '()' in scheme/lisp.

DLR Notes 4

Link. February 13, 2008. Comments [0]. Posted in: .NET | DLR

I thought I'd use a few posts to talk a bit about some interesting ways that IronRuby does things. To do this effectively, we need to have a way to look at the code that IronRuby generates, as just browsing the code isn't going to do much good if you don't know where to start!

Fortunately, the DLR has our backs covered! As I've mentioned in the past, the DLR allows as to see the ASTs used for code generation as well as the Rules used to execute actions, and this is an invaluable tool both to debug our own language implementations, as well as to see what other implementations are doing.

Looking at both ASTs and Rules is easy if you're using the DLR Console either to execute a script in a file or in REPL mode, using the /X:ShowASTs and /X:ShowRules switches, like this:

.\rbx /X:ShowASTs test.rb

Creating .NET Objects

One of the strengths of IronRuby is that it allows us to work against not only all the Ruby goodness, but also access a lot of the functionality already available in the .NET Framework [1].

For example, I could create an instance of System.Text.UTF8Encoding like this:

x = System::Text::UTF8Encoding.new(false)

So what happens when we do this? The current implementation of IronRuby will generate an AST that looks like this:

.return (.bound x) = (Object).action (Object) InvokeMember new(
    (RubyOps.GetConstantExplicit)(
        (RubyOps.GetConstantExplicit)(
            (RubyOps.GetGlobalConstant)(
                .context,
                (SymbolId)System,
            ),
            .context,
            (SymbolId)Text,
        ),
        .context,
        (SymbolId)UTF8Encoding,
    )
    (Boolean)False
);

There are several interesting things about this tree. As you can see, the qualified class name is translated to an object through a series of calls to methods in the RubyOps class. The first two calls return RubyModule objects corresponding to the "System" and "System::Text" namespace, while the last one returns a RubyClass object representing the type for the UTF8Encoding class. Then an InvokeMember action is built to call the "new" method in this RubyClass object, which will return the new instance of the UTF8Encoding class.

However, if you go and look at the code for the RubyClass and RubyModule classes, you won't find a "new" method anywhere. What gives?

The trick is, of course, in the IDynamicObject implementation of RubyClass/RubyModule, which receives the InvokeMember action and generates the right set of rules for this case. The method doesn't need to exists, as long as we can resolve the call and build a StandardRule whose target involves the right action/call being executed at runtime. In our simple example, this should mean creating a new instance of UTF8Encoding, which we can see

//
// AST Rule.Test
//

((((.bound $arg0) == (RubyClass)Ruby.Builtins.RubyClass) && ((RubyClass)Ruby.Builtins.RubyClass.Version ==
15111)) && (Boolean)True)
//
// AST Rule.Target
//

.return .comma {
    (.bound val) = .new UTF8Encoding(
        (.bound $arg1),
    )
    (.bound val)
}

The process through which IronRuby builds new instances (or rather, derives the right set of rules for a new expression) is actually fairly complex; even for this simple case. However, in the end, you can think of this process in a simplified way:

  1. The GetRule() implementation resolves the "new" method call in the RubyClass object by based on a fairly complicated set of lookups around the underlying .NET type and its superclasses and their corresponding RubyClass objects. What matters to us is that, at some point, IronRuby realizes that it really means "create a new instance",
  2. RubyClass.SetRuleForCreateInstance() is invoked, which will configure the rule correctly for this case.
  3. Within this, IronRuby notes that UTF8Encoding isn't a class defined in Ruby, but actually a CLR type.
  4. The set of candidate constructors of the CLR type is evaluated to find the right constructor to use.
  5. A New Expression (Tree.Ast.New()) is provided as the target of the rule.

So this is how our original InvokeMember action is converted into a new expression! Actually, this is a pretty simple case as far as IronRuby goes, because UTF8Encoding doesn't something corresponding in the Ruby language. To see a different scenario try creating an ArrayList (or other object that implements IList) and you will see a lot more interesting things going on to make the object and its class act ruby-like.

What about my language?

Does this mean you have to do it the same way when implementing your own language on the DLR? Certainly not. There are many ways you can choose to implement it and the DLR actually provides a few ways for that.

For example, in this particular scenario IronRuby is relying on an InvokeMember action, but the DLR also has a Create action which provides another way to represent your object construction. You still need to provide a callable object (IDynamicObject) to provide the rule, but it might not necessarily be very complicated once you resolve the typename to a CLR type.

Of course, the trick is finding out the right constructor to invoke for a given expression, and fortunately, the DLR provides some help to do this with the MethodBinder class. Here's a pretty minimal (not particularly efficient) implementation  you could start with:

private void SetCreateRule(StandardRule rule, ActionBinder binder, CodeContext context, object[] args) {
   rule.AddTest(
      Tree.Ast.Equal(rule.Parameters[0], Tree.Ast.Constant(this))
      );

   Type[] argTypes = new Type[args.Length-1];
   for ( int i=0; i < argTypes.Length; i++ ) {
      argTypes[i] = CompilerHelpers.GetType(args[i+1]); 
   }

   ConstructorInfo[] all = ClrType.GetConstructors();
   MethodBinder methodBinder = MethodBinder.MakeBinder(binder, ".ctor", all);
   BindingTarget target = methodBinder.MakeBindingTarget(CallType.None, argTypes);
   if ( target.Success ) {
      Tree.Expression[] actualArgs = ArrayUtils.RemoveFirst(rule.Parameters);

      rule.Target = rule.MakeReturn(
         binder,
         target.MakeExpression(rule, actualArgs)
      );
   } else {
      rule.Target = rule.MakeError(
         Tree.Ast.Call(
            GetType().GetMethod("InvalidArgs")
         )
      );
   }
}

[1] The examples in this post assume you're 'require'-ing mscorlib.dll.

DLR Notes 3

Link. February 7, 2008. Comments [0]. Posted in: .NET | DLR

Last time I talked a bit about Call Actions; now let's talk a bit about another kind of action: InvokeMember actions. As their name implies, these are useful for calling instance methods on .NET objects.

I struggled a bit with InvokeMember actions because they are act a little differently than Call actions. In particular, I was confused why sometimes InvokeMember actions seemed to require an extra expression in the argument list to work, when compared to Call actions. Fortunately, Dino Viehland clarified the reason for this, though it took me a lot of trial and error for his explanation to finally "click". I'll explain this in a moment.

To create a new InvokeMember action, you need the name of the method to invoke, the result type (like with a Call action, typeof(object) might suffice), the CallSignature and a list of expressions to use an arguments.

The CallSignature structure is interesting, because it can be used to provide information about how to interpret the call arguments. For example, it can be used to point out if an argument is the instance expression, if it's a named argument and so on (check the ArgumentKind enumeration for all the options). Filling in the CallSignature correctly is key to getting the correct behavior.

Here's what initially confused me about how InvokeMember works. Consider two different scenarios of how an InvokeMember action can be used to call a regular instance method:

  1. You create a simple CallSignature that does not have an argument with ArgumentKind.Instance. In this case, building an InvokeMember action is very similar to building a Call action: first the "target" expression (which evaluates to the object on which to invoke the method), followed by the expressions representing the actual arguments to the call.
  2. You build a CallSignature with an explicit Instance argument (usually the first one). Now, the call arguments need to be a bit more complex, defined as:
    1. The callable object (an expression evaluating to it, actually).
    2. The target expression evaluating to the object on which to invoke the method.
    3. Zero or more expressions representing the actual arguments to the method.

The reason this can be confusing is because this "Callable object" should not appear in the CallSignature you create to build the InvokeMember action: It really is a "hidden" parameter, so if you're calling a method requiring 3 arguments, you'd create a CallSignature listing 4 arguments (instance + actual args) but provide an expression array containing 5 expressions as the arguments to Ast.Action.InvokeMember().

But what is this Callable object? If you are thinking that it must be similar to what I called "Callable Object" when using Call Actions, you'd be right. Basically, the callable object argument can be used to provide a helper object that can help either make the actual call at runtime or, if it implements IDynamicObject, build the rule used by the DLR to invoke the actual method. [1]

Notice however, that unlike our sample last time, your GetRule<T> implementation will need to create a rule for DynamicActionKind.InvokeMember instead. One thing that is confusing is that you can use scenario (2) above but simply provide the target object as the first expression (that is, pass it twice), then the call will work as well directly, matching the behavior of scenario (1) [2].

In either way, the ability to use a Callable Object provides an easy way to hook the member invocation process to tweak how the call is made, which is a very useful capability in a dynamic language.

For example, IronRuby implements IDynamicObject in its built-in module and class objects (RubyModule and RubyClass, respectively) and are used to resolve method calls to ruby objects; you can look at how the default rule for InvokeMember actions is created inside MakeRuleForInvokeMember() method of the RubyBinder class. If I understand the code correctly, IronRuby takes advantage of this capability to implement method_missing. You can also check how calls to real .NET methods/properties are resolved to the original methods in RubyClass.TryGetClrMember().

[1] By now you can realize this pattern applies to all types of actions, not just Call and InvokeMember.
[2] In this case, a default rule is provided by the ActionBinder.MakeRule<T>() method, which you can also override and customize in your language binder implementation.

DLR Notes 2

Link. February 5, 2008. Comments [0]. Posted in: .NET | DLR

This time I'm going to talk a bit as to how to implement function calls in the Dynamic Language Runtime. Last time, I mentioned that I had initially had implemented function definitions as simply returning a raw CodeBlockExpression. This works, but it doesn't give you many options to add custom language-specific behaviors.

A more useful approach in many cases is to wrap your function into an actual object, at runtime, which can then be used for different purposes (including actually invoking the function). So instead of simply moving the CodeBlockExpression around you create a subtree that creates a new object with the CodeBlockExpression as an argument (among other things) and return that instead (with possibly other actions around it, like putting it into a variable).

This is how both IronRuby and ToyScript do it (and I imagine, IronPython). If you look at ToyScript's Def class (the language-specific AST node for a function definition), you will this in action:

Ast.Assign(
  tg.GetOrMakeLocal(_name),
  Ast.Call(
      typeof(ToyFunction).GetMethod("Create"),
      Ast.Constant(_name),
      Ast.NewArray(typeof(string[]), names),
      Ast.CodeBlockExpression(block, false)
  )
)

As you can see, it will generate a Call expression to ToyFunction.Create(), which is a static method that returns an instance of the ToyFunction class. At runtime, ToyFunction.Create() will receive the name of the function, the names of the method parameters and a Delegate instance that can be used to actually invoke the code later on. That last part is the interesting bit.

So later on, you want to invoke your function. You already have a variable somewhere that evaluates to a Function object at runtime (in the case of ToyScript, a ToyFunction object as we mentioned). In more general terms, what you will have is an expression that will evaluate to the function object; it really could be anything (like another function call that returns another function). But how do you actually invoke it?

Turns out there are several ways you can implement this, with varying degrees of performance characteristics and needs.

Generating Calls

In the case of invoking a standalone (local or global) function you already defined in your language, the most obvious choice is to use a direct Call expression, via Ast.Call(). We already saw an example in the short snippet above, but for a different use.

Ast.Call() takes a System.Reflection.MethodInfo object (the method to execute) and a params array of expressions to use as arguments to the call (there are other overloads available). If the MethodInfo object points to a non-static method, then the first expression in the arguments list will be the instance [target] object on which to actually make the call.

This is pretty useful if you want to call methods you're aware of at compile/parse time, such as a built-in function like ToyFunction.Create() above, but it's not useful at all to do what we want: Invoking a CodeBlock we created somewhere else.

For this we need to turn to generating a Call action, using Ast.Action.Call():

public static ActionExpression Call(Type result, params Expression[] arguments);

The first parameter is the type of the return value of the call; which can be typeof(object) unless you have more advanced information about what the call will return. The second argument is the list of parameter values you want to pass to the function.

...So how do you tell Call() what to actually invoke?

Yep, that tripped me up as well. Turns out that a CallAction expression expects that the first expression in the arguments list evaluates to a callable object. At least that's how I think of it. But what's a callable object? As I understand it, it means that it references an object that either:

  1. Implements IDynamicObject, or
  2. Has a public instance method with the following signature:
    object Call(CodeContext context, params object[] arguments);

The current ToyScript incarnation follows path 2. Here's the implementation of the Call() method:

[SpecialName]
public object Call(CodeContext context, params object[] arguments) {
   ParameterInfo[] parameters = _target.Method.GetParameters();
   if (parameters.Length > arguments.Length) {
       if ((parameters.Length > 0 && parameters[0].ParameterType == typeof(CodeContext)) ||
           (_target.Target != null && _target.Method.IsStatic && parameters.Length > 1 && parameters[1].ParameterType == typeof(CodeContext))) {
           arguments = ArrayUtils.Insert<object>(context, arguments);
       }
   }
   return ReflectionUtils.InvokeDelegate(_target, arguments);
}

The code can look a bit convoluted, but it's actually quite simple: It simple checks the actual arguments supplied to the call to see if the CodeContext has already been included explicitly in it; otherwise it adds it as the first argument, and then uses ReflectionUtils.InvokeDelegate() to actually invoke the ToyScript function represented by this instance of ToyFunction (_target is a field containing our Delegate instance). It's easy, but I think it doesn't have the best performance.

IDynamicObject

A slightly more complex implementation would instead implement IDynamicObject. This what IronRuby's Proc class does, and it's what I currently have working on my own language implementation. The core of the IDynamicObject idea is that, unlike with the special Call() method above, our function object is no longer responsible for actually doing the call on the target function. Instead, we're merely responsible for providing the DLR with a rule it can use to invoke it. This is exactly what you do inside your GetRule<T>() implementation.

In our case, what we really want is to respond with a new rule for the Call dynamic action, which is the only one we're interested in it. My current (simplistic) implementation works like this:

public StandardRule<T> GetRule<T>(DynamicAction action, CodeContext context, object[] args) {
   switch ( action.Kind ) {
      case DynamicActionKind.Call:
         StandardRule<T> result = new StandardRule<T>();
         ActionBinder binder = LanguageContext.Binder;
         SetCallRule(result, binder, result.Parameters);
         return result;
      default:
         return null;
   }
}

private void SetCallRule(StandardRule result, ActionBinder binder, IList<Tree.Expression> args) {
   // args[0] == this
   result.AddTest(
      Tree.Ast.Equal(
         args[0],
         Tree.Ast.Constant(this)
      ));

   Tree.Expression[] expr = new Tree.Expression[args.Count-1];
   for ( int i = 0; i < expr.Length; i++ )
      expr[i] = args[i+1];
   result.Target = result.MakeReturn(binder, Tree.Ast.ComplexCallHelper(Target, expr));
}

As you can see, we create a new StandardRule but a very simple one:

  • The test part of the rule simply checks that the target object on which to invoke the call is the same as our instance. This is what tells the DLR if the call should or should not be made. Here's I'm simply doing an identity test, which seems to work fine in my unit tests. IronRuby, for example, instead uses a unique ID assigned to each Proc instance for the call test.
  • The second part will create a new array with the actual arguments to the call (i.e. without the first 'this' argument) and use that to build the call using Ast.ComplexCallHelper().

One we return the rule, the DLR can cache that information to make following calls more efficient, for example.

Something to watch out for here and that really confused me at first: If you read the above code carefully, you'll notice that the DLR Actually gives us an object[] to GetRule<T>(). This will contain the actual values of arguments to the call (not expressions that evaluate to them), so you can examine them to decide how to best to create your rule. Notice however that I don't actually use them. Instead I use the already existing expressions in StandardRule<T>.Parameters, which will be already populated with the expressions for your call arguments.

The reason for this is that, as it turns out, is that if you used the original values to build your own Expressions to create the test (and possibly parameters) for the call, you will run into an issue that can cause the DLR to go into an infinite recursion of nested DynamicSite.Invoke() calls. Not fun to debug, and looking at the rules or AST dumps won't tell you why it's failing. I think the relevant issue is mentioned in a comment in the DLR source code that reads "The test should be such that it does not become invalid immediately. Otherwise, ActionBinder.UpdateSiteAndExecute can potentially loop infinitely."

Anyway, once you realize how Ast.Action.Call() works and what it expects from you, it all starts making a lot more sense.

ActionBinders

It is also important to note that already in these simple scenarios your language's ActionBinder implementation starts to kick in. In particular, you need to ensure that it implements the required type conversions so that the DLR can make any conversions necessary between the actual argument values to a call and the types of the parameters as declared in the CodeBlock (through ActionBinder.ConvertExpression()).

Next time I'll talk a bit about InvokeMember actions.

Syndicate

About

Tomas Restrepo is a software developer located in Colombia, South America. His interests include .NET, Connected Systems, PowerShell and lately dynamic programming languages. More...

tomasrestrepo @ twitter My Flickr photostream My saved links on delicious My Technorati Profile

email: tomas@winterdom.com
msn: tomasr@passport.com

View my profile on LinkedIn

MVP logo

Ads


Categories

Statistics

Total Posts: 1050
This Year: 1
This Month: 1
This Week: 0
Comments: 827

Archive

Other

Copyright © 2002-2008, Tomas Restrepo.

Powered by: newtelligence dasBlog 2.2.8279.16125

Sign In