usql icon indicating copy to clipboard operation
usql copied to clipboard

Add support for F# as a language type

Open mwinkle opened this issue 9 years ago • 12 comments

This came in as a feature request from uservoice, opening this issue to track discussion and refine the suggestion. This is the text from uservoice

It would be great to see some F# support for USQL. I understand that you can already do UDFs in any .NET language (provided you inherit from the required base class or interface etc.) but having F# inline with USQL would be excellent. F#'s lightweight syntax and expression-based syntax would be a natural fit with the SQL section of USQL, and I think would provide a more seamless experience switching when between SQL and .NET code than SQL and C#.

Pushing this further one could envisage an F# / USQL type provider along the lines of the SQL Client one (http://fsprojects.github.io/FSharp.Data.SqlClient/) which could allow you to consume USQL from within F# directly.

mwinkle avatar Nov 11 '15 17:11 mwinkle

Just so I don't lose it, here is an example of a UDO (in this case a Processor), written in F#. The only thing required is to ensure the FSharp.Core.dll is deployed and referenced as well as your assembly.

mwinkle avatar Nov 11 '15 18:11 mwinkle

Cool. Thanks for opening this. I guess there's three ways you can look at this: -

  1. Integration of F# directly within the USQL syntax, as an alternative to C# syntax. I need to look a bit more into the way C# ties in with USQL currently but F# feels a very natural fit for this - expression based, lightweight syntax (no curly braces etc.), and a powerful type inference system.
  2. Making the API for UDOs a little more F# friendly which would cut down on the amount of boilerplate needed when implementing them by taking advantage of some F# features. I can comment on the gist with an example or two to illustrate this.
  3. Explore how we can consume USQL from within e.g. F# scripts. The SQL type provider is an example of how you can blend F# and another language and have a type provider handle the plumbing between them, making the transition between them seamless.

Cheers

isaacabraham avatar Nov 11 '15 19:11 isaacabraham

As a start, a quick win would be to ensure that FSharp.Core is deployed by default. Hopefully this would be easy to rectify.

isaacabraham avatar Nov 11 '15 19:11 isaacabraham

I've commented on the gist with some (hypothetical) examples of how things might look - I just put them together now, but maybe gives you an idea of where I'm coming from.

Some things I've noticed - and I'd be interested in your thoughts on these points (if you think this should go in another issue etc., just let me know): -

  1. Why are the IRow, IUpdateableRow etc. actually abstract classes? Surely then call them Row or RowBase etc.?
  2. Do these really need to be classes? Why not just single-method interfaces, or (as per my gist), just a single function signature?
  3. Is there a reason why Output is supplied as an argument into the Process function - why not just let the developer of the function create an instance of it themselves within the function? Then the function would be simply i.e. IRow -> IRow.
  4. Any possibility of having a strongly typed version of IProcessor i.e. IProcessor<TInput, TOutput> which would then have Process(row:IRow<TInput>) : output:IRow<TOutput> - again, this is basically just the prototypical map function in many collection libraries (or select in LINQ).

isaacabraham avatar Nov 12 '15 02:11 isaacabraham

I would separate into three different issues as they will be easier to stage. Thanks for the gist comments as well, that reflects more the massively long time since I wrote any f#, so I appreciate something which feels more idiomatic.

Question on the type provider, do you feel that type providers are useful for batch processing systems? I think we had a provider for hive at one point which was super cool but the latency of query time didn't seem to make it too useful, Ono

mwinkle avatar Nov 12 '15 02:11 mwinkle

I remember seeing that Hive TP - I think someone even managed to get it working outside of the Silverlight TryFSharp website!

TPs themselves are not inherently "slow" - they simply allow you to wrap around calls that you would normally do anyway. In this case, if that TP was generated once per row, that might have an overhead (as it would need to parse that file every time) but if it stayed alive across calls to the UDO within the AppDomain, and it was simply being used for safe property accesses, then it shouldn't really be much slower at all than a custom class that you had defined to wrap around IRow that did that anyway.

As another example of where a TP can be used within cloud data processing, look at the Azure Storage TP (http://fsprojects.github.io/AzureStorageTypeProvider/) which does all sorts of IO when dotting into containers but at runtime it's not really slower than just raw calls to the .NET SDK (which is what the TP is a wrapper around).

isaacabraham avatar Nov 12 '15 02:11 isaacabraham

Done re: separate issues.

isaacabraham avatar Nov 12 '15 02:11 isaacabraham

Thanks for filing the separate issues.

As to making the expression language F#: Note that this would make the language a different language. The expression language is an intrinsic part of the language and thus changing to F# would make it a different language (which has its own pro and cons).

As to loading additional assemblies: We currently limit the built-in set for a couple of reasons:

  1. a lot of the core C# operators and methods are actually implemented natively in U-SQL.
  2. loading assemblies and "runtimes" per default is costing both in setup time as well as using up memory that is otherwise available to the end user.

Did you try to just add it with REFERENCE SYSTEM?

MikeRys avatar Dec 07 '15 22:12 MikeRys

Hey @MikeRys . Appreciate your point about loading assemblies - without knowing the internals of USQL etc., I can't comment on exactly how things work. I'm not actually suggesting loading the FSharp.Core into e.g. the app domain eagerly. But it's not even deployed on wherever the USQL runs, which is more my point. This is something that to this day still dogs Cloud Services - FSharp.Core isn't deployed on them by default (or wasn't when I last checked).

I'm not sure what you're referring to by runtimes though - FSharp.Core is just a .NET assembly at the end of the day. Can you clarify?

Cheers

isaacabraham avatar Dec 09 '15 15:12 isaacabraham

Thanks Isaac Since FSharp.Core is not part of .Net 4.5's distribution, you have to load the assembly yourself which may seem a bit more hassle than it's worth.

We could try to have it included in the internal machine builds (which is done by another team entirely), but it has a high risk of quickly getting out of alignment with the version you would want to use.

Thus, having to register the assembly yourself in your account (once) gives you the flexibility to be the master of your own version and destiny :).

The runtime comment was just a more generic comment about including other languages (e.g., Python) and the added cost of having them preloaded without user-specification.

MikeRys avatar Dec 11 '15 01:12 MikeRys

That makes sense - I guess having FSharp.Core outside of .NET is a separate issue really, and at any rate the way that CoreCLR is going, separating things out via e.g. NuGet and supplying them yourself seems to be the way to go anyway.

isaacabraham avatar Dec 11 '15 01:12 isaacabraham

Really remarkable to see the efforts in establishing and defining a whole new language in U-SQL when you could have saved a ton of money and corresponding pain by simply leveraging the existing .NET framework and all the existing tooling around it.

Mike-E-angelo avatar Nov 11 '17 06:11 Mike-E-angelo