zenscript
zenscript copied to clipboard
Saluations ! :)
Hello :)
I met shelby on the forum bitcointalk, i'm developping a framework and script engine to develop distributed application based on blockchain and html5/js, he talked about this github and zenscript, and i've been reading it for a while and there are lot of interesting discussion related to where i want to get at :)
To explain shortly, my idea stem from developing web browser plugins, and learning ATL - COM - Activex, XPCOM, and DOM, and how atl /xpcom use IDL files to define an interface that can be compiled to different languages, and then object in the DOM tree are defined as component implementing the interface defined in the IDL, and bound to html entities as object in javascript, in sort that you can 'embed' an ATL component in the html document and call the function of the c++ interface in javascript from the browser engine.
What i found extremely frustrating with those system is that despite the whole high level definition of interface, there is not really much cross platform binary compatibility, due to combination of proprietary thing for COM, and ideological obsession with open source and not caring about binary compatibility with C runtime etc on unix for XPCOM, there is not really any cross platform binary compatible version of such kind of components.
And there is always the problem with vanilla C/c++ that pointer are very crude, and can't get any metainformation of them, if they are from heap, from the stack, what size there is allocated after them, what kind of data it point to (f you don't know the type from compile time definition. Which can be very bothering on many levels.
Charm ( https://en.wikipedia.org/wiki/Charm_(programming_language) ) is a good example of what i want to get at, with this concept of object encapsulate as module and importable from each other, with simple synthax almost close to basic, but still easily compiled to assembler.
Node.js can have similary objective, but to me it use too much resource, and not good enough with binary data and/or linear algebra, simd, and threading, and the garbage collector thing is also annoying to me.
The other model is also language such as AS3 or flex builder/android sdk based on eclipse, with the XML UI-application-service definition, and i found this kind of application definition based on xml entities mapped to object dynamically at runtime very useful and clean.
My original idea was to develop a framework or script engine that is 100% event oriented, if possible stackless, based on asynchronous message queue, in sort that each function can be called regardless of context whenever a certain request or event need to be processed, and avoiding linear execution flow based main loop & stack, but rather on posting request and dispatching the processing and result/error handling aysnchronously like AS3 with green threaded event listeners that force asynchronicity on everything.
Even the android sdk tend to works like this, and javascript too, and force certain number of thing as background task, in AS3 it's mostly forced because all function are asynchronous, which keep the main UI processing always low latency.
And it's one of the idea i want to get at, keeping low latency main loop event dispatching, and posting asynchronous request using dynamic type with object instantiable at runtime from json or xml, and still being close to CPU and compiling to dynamically linked binary executable.
In this course i developed my own ABI with position independent code, support for dynamic linking, and made a tool to convert .so & dll files to this format, complemented with the dynamic typing system, and banning all call to compiler specific C runtime & libC, it allow to have operating system agnostic binary module who export API as functions taking reference pointer to dynamic object as arguments, which make perfectly portable API, and with native support for json object definition, it allow to easily implement JSON/RPC interfaces useable from javascript, which is also useful to program blockchain nodes.
To summarize, my plan is to have
-
portable binary module who support interface that use dynamic type as argument, those dynamic object can be instantiated from json definition, and json definition can be made from them, as well as other form of serialization.
-
Loopless/lockless/stackless function definition encapsulated as binary module exports or script routine.
-
safe memory with internal allocator, lockless reference counter, memory leak detection, explicit dynamic typing with runtime access check etc.
-
Transparent lockless multi thread as much as possible ( the technics are explained in the site 1024cores that shelby posted before here, i took most of the design for lockless list of object references from there).
-
defining network protocol message handlers, and integrating it into the event based framework as components definition.
-
If possible useable to boot on baremetal micro kernel, to be used for PI /ARM devices.
The problem is i didn't really find a language who could really allow to have all this for the moment, so the first part was already doing the low level ugly work from C, i took C because anyway most kernel and drivers / operating system are made in C, so anyway for any language that has to use kernel API, harware, interupts or other it need some kind of glue code in C, so i started from there, and developed the system of dynamic object.
I'm not very familliar with haskell, i have tried to get into it a bit but i don't really get it yet, but from what i can understand, i think my idea is close to the idea of monad with haskel, which are base placeholder for what i called a 'node' , which can be assigned a type and a name and data, and a list of children that are pointer to a reference of this same monad with type/name/data.
All access to the node/monads data from the C are 'semi monomorphized' (semi because only the type of the variable that need to be read or wrote is monomorphized) , and the tree system can already convert most simple non composed type (strs/int/hashes/float/vec3/mat3 etc) to each other transparently, and converted to josn. Node/monads can created with specific composed type if they contain a certain pre defined collection of child nodes.
From the C compiler stand point, the type of those 'monades/nodes' is completly opaque, the compiler just manipulate reference pointers, but all the 'leaf data' of nodes is associated with an explicit type and there are monomorphized function to read/write their value to a desired type with automatic conversion from the stored type to the destination type (again only for simple non composed types).
the interface for this tree is defined there :
https://github.com/NodixBlockchain/nodix/blob/master/libbase/include/tree.h
And such kind of 'monad tree' can be instantiated from json, and i added the possibility to add an explicit type to json object or keys definition that will be used by the node/monad instantiated from it.
The script language looks like this for the moment
https://github.com/NodixBlockchain/nodix/blob/master/export/nodix.node
I'm also using the script to define 'website' as collection of methods that can be called via 'http://xx.com/script.site/method/param1/param2' a la codeigniter, and can be used to generate html, and embed javascript variables into the page from the dynamically typed nodes/monads.
the script to generate webpage looks like this :
https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site
Like this the objective is to have standalone portable binary who replace whole stack of software such as apache/php/mysql/blockchain nodes/nodes.js with single node who can generate dynamic page from blockchain data, use javascript crypto & signature via web browser, and also implement JSON/RPC API to make more complex interactive HTML5 applications.
The script language is completely stack less, parameters are instanciated as a local object associated with the function, and it's supposed to be used to define endpoint for event handling on dynamically typed objects, either they come from the binary P2P protocol from blockchain, or from HTTP/JSON requests.
Well it's just to introduce 'quickly' where i want to get at, but i see many time the discussions on this git seem to be revolving around same kind of issues i'm trying to solve too, with dynamic typing, and cross platform / language module definitions, with good support for scaling, and still integrated fully with DOM/js.
I'm still quite early in the design, well i already have lot of stuff working and well developed on the low level side, but it's the discussion with shelby on btctalk and reading stuff here that sparked me to get started on the script engine itself, as it's also much better to introduce the framework than low level C code.
The high level language is still very simple and lacks a lot of things, but i don't have that much experience with high level language like haskel or rust, or all the problematics that can be involved with component/modules interface definition based on dynamic typing, and how to schedule execution flow based on event handlers etc.
Well I'm still crunching stuff and debuguing for the moment, and i'm also making a website a bit to explain things, and give more documentation and news. I should come up with in next week let say.
Well i hope it's not too long :D
I have been digging more into haskell yesterday, and i think my objective is very similar to typeclass / monad system.
The principle is to have generic code only requiring variable/object it manipulate to be convertible to the type required in the code through monomorphized functions.
The system of node i've been working does the same with C code, and allow to write generic C that doesn't need to know about the specialized type of object it manipulate, and with the reference counter there is no pointer ownsership so nodes are automatically freed where the last reference to it is released.
I've been watching the code from your repository Haskell-Webservice-Framework , the way they program server side page generation is very similar to my script actually lol
https://github.com/keean/Haskell-Webservice-Framework/blob/master/Modules/Admin/App.hs
` handle :: RequestHandler
handle req response = do _ <- case lookupFM (((UriParameters a) -> a) (uriParameters $ requestURI req)) "RESTART" of Just "on" -> fail "server restarted..." _ -> return ()
htmlText "HTTP/1.0 200 OK\n"
htmlHeaders [MkAttribute ("Set-Cookie","test=1; path=/;")]
htmlText "\n"
htmlHead $ htmlTitle "HyperServer/Admin"
htmlBody $ do
htmlH1 $ htmlText "HyperServer Admin Page"
write response
-- ioThreadDelay 3000000
htmlForm $ do
htmlCheckbox "RESTART"
htmlText "Restart Server"
htmlSubmit "Submit"
htmlSmall (showHeaders $ fmToList $ ((\(HttpHeaders a) -> a) (requestHeaders req)))
write response
`
My script :)
https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site
let NODE_MODULE_DEF node_adx = {"file" : "modz/node_adx.tpo"}
let NODE_MODULE_DEF wallet = {"file" : "modz/wallet.tpo"}
page index =
push scripts,"/assets/js/blocks.js"
node_adx.node_get_script_modules(node_modules);
node_adx.node_get_mem_pool (SelfNode.mempool);
html_head "NodiX INFOS"
html_block "templates/menu.html"
html_block "templates/node.html"
html_scripts
html_var SelfNode;
html_var node_modules;
html_var SelfNode.mempool;
html_js
$(document).ready(function ()
{
site_base_url = '/nodix.site';
api_base_url ='';
lang = 'en';
$('#node_name').html(SelfNode.user_agent);
$('#node_version').html(SelfNode.version);
$('#node_bheight').html(SelfNode.block_height);
$('#node_port').html(SelfNode.p2p_addr.port);
$('#node_addr').html(SelfNode.p2p_addr.addr);
$('#lastblock').html(SelfNode.last_block);
$('#lastblock').attr('href','nodix.site/block/'+SelfNode.last_block);
$('#lastblock').attr('data-target','#blockmodal');
$('#lastblock').attr('data-toggle','modal');
make_node_html ('node_div',SelfNode);
for(var n=0;n<SelfNode.peer_nodes.length;n++)
{
make_node_html ('peer_nodes_div',SelfNode.peer_nodes[n]);
}
make_modules_html("node_modules",node_modules);
update_mempool_txs(mempool,'mempool');
get_node_lag(SelfNode)
});
end_js
html_block "templates/footer.html"
success
Almost exactly the same keywords and overall structure :)
But in fact where i want to get at, also by reading the discussion on this github, and the concept is very similar to what i can see of this haskel webserver framework, is this :
For the moment, i have a system to define blockchain p2p service in sort that the node first read packet header, and then a function in the protocol module associate a dictionary for each type of message identified by their header signature, then it can deserialize the binary data to this runtime defined dictionary, and then call the scripted handler routine with a reference to this deserialized object for the message handler.
The code for this looks like this :
https://github.com/NodixBlockchain/nodix/blob/master/protocol_adx/protocol.c#L1429
if (!strncmp_c(&data->str[4], "version", 7))
make_string(&pack_str, "{(\"payload\",0x0B000010) (0x02)\"proto_ver\" : 0,\"services\" : 0, \"timestamp\" : 0, (0x0B000040)\"their_addr\":\"\", (0x0B000040)\"my_addr\":\"\",\"nonce\":0,(0x0B000100)\"user_agent\":\"\", (0x02)\"last_blk\":0}");
else if (!strncmp_c(&data->str[4], "ping", 4))
make_string(&pack_str, "{(\"payload\",0x0B000010) \"nonce\":0}");
else if (!strncmp_c(&data->str[4], "pong", 4))
make_string(&pack_str, "{(\"payload\",0x0B000010) \"nonce\":0}");
The structure of the message is constant and known at compile time for the moment, but it could be defined in a script as a dictionary to associate a message header signature to a type definition, if all the leaf member type size is known, it can deserialize automatically the object from binary data, and send the instance to the message handler associated with the message header dictionary.
It would allow for high level definition of network protocol service, with dictionary of serialized object associated with protocol message , associated with an handler function with this object type as input.
For most binary protocol it fit well, because most of them will have header signature and a sort of firt application layer in the network protocol to describe packet data, and it will allow to quickly define scripted handlers for binary protocols.
The side with webservice is different, http need special handling to make it easy to write page generation script or rpc server, to handle query data variable, post data, cookies / headers and all the web related things.
But i guess something could still be though off to have a better synergy between web server and cgi scripts than the super low coupling with apache + php, the web server could have a conception of the application it's running, and allow more easily persistent data, sessions, more in a way like tomcat, but without java or virtual machine, allowing for 'smart' generic function from the web server based on the script definition, like tomcat have it's system of encapsulation for servlets, and can allow for built-in handling of the HTTP protocol request in the script definition of the application, and not necessarily only with a function level granularity. Like can be easy to generate sitemap or other things automatically from the webserver based on the application definition. And potentially way to generate automatically jquery plugins script file based on module definition with rpc binding.
But for the moment i would be interested to find the best formulation for defining objects or typeclass to handle binary network protocol request, with dictionary of object associated with the protocol message header, which be instantiated automatically/blindly by the node framework with generic code (typeclass like) and associated with an handler routine that take this object as parameter, and eventually associated with a template system to display the object data in html with specialized function for each type of object/array of object.
Then network services can be defined as aggregation of such object defined potentially at runtime based on dictionary and typeclass like generic code. But still the integrity can be checked at both end, the node can check if the binary data fit the dictionary via message header and size, and the script handler can know the type of its argument and test for the presence of the member/property it needs at runtime.
But i will look more into haskell probably, it seem to have lot of interesting idea with it. I'm only middly interested to use it, because i'm always a bit wondering with new language like this, when i need crypto, database system, and lot of underlying function, i'm never sure if their whole package doesn't contain bugs or are broken, sometime there are bugs in their modules or such and they don't even really say it, and need to dig in some git issue and then find out your stuck for 6 month the time they solve it lol
But i think my objective is very similar in the principle, but all the core is made in C code and portable positon independant binary modules, and i still want to retain the possibility for easy bare metal booting for ARM/PI =)
Scripts can make call to the modules's exported function with pointer to the reference to the node/objects as parameters, if the function in the C module take generic reference pointer as parameter.
The presence of exported function can be detected at runtime, it still lack meta data to define the number of argument, but all the arguments are identical from C compiler point of view. And it would not be too hard to add manual definition in module to get the number of argument from the exported functions to detect problem at runtime.
As you know I have been very busy on other matters, so I have not been able to fully digest your posts yet.
Just glancing at your OP, I am not sure if you tied this in sufficiently with @keean’s Zenscript PL design goals. Perhaps you need to explain more the relevancy or parallels between your experimentation and @keean’s. But as I said, I have not yet read your posts carefully. I am rushed.
Also I just learned yesterday that @keean is in fact very busy with his software company’s daily workload. I think he is probably most motivated by discussions which further his aims as expressed in the other discussion focused Issue threads.
I have highlighted parallels more in the second post :)
Yeah no rush, I post this now as im into it, he will answer when he has time :)
For the moment im taking bit step back from core coding and will get more into the website / doc / explanation etc
I have bit more time for chatting in coming days, less hard coding normally :) Some conceptual thinking à bit :) But not necessarily for short term, but if I find a good way to get with what I explain the 2nd post it's something that can be wisheable to integrate.
@keean replied to me in private and wrote one sentence only that he will take a look when he has time. I know he was on a business trip this week also.
I will try to answer some points in other issues too of things i saw i think i solved :)
My solution is a bit gordian knot solving, but normally should still have good level of security etc
I think the main difference in approach is i don't even attempt at making the compiler checking anything, but everything is (or can be) checked at runtime.
The actual type are only resolved at runtime, and converted to the required type live from the node instance.
All function to access nodes/monads have success/failure return state, in sort that all access can be tested for success, and if it return failure then the output value is not altered, and should not be used before to be initialized.
And i limit multi thread interaction to simple case with asynchronous event framework , which can be green thread or heavy thread with lockless message list.
But yeah no rush :) I will try to make the point i can see relevant in the other issues, i posted in the thread about concurency about the issue you talked about in pm on bct.
Hi NodixBlockchain, you are taking a very different approach to types than I am. What I am interested in is proofs about programs, and generics algorithms. These roughly correspond to proofs about algebra and algebra respectively in maths.
The idea with static types is that if you can prove variable 'x' only ever has an integer type assigned to it or read from it, then you can omit all the runtime checks, making the program faster. Obviously we would like to go on to prove other things about the programs to allow more runtime operations to be omitted.
Dynamic languages like JavaScript and Python do exactly what you suggest and defer all typing and type checking to runtime. You may also be interested to know that a JavaScript Promise is a monad, (with bind = then and unit = Promise.resolve). Promises not only allow chaining asynchronous operations, but also have a success and failure return state (actually a continuation, but the same effect). This works well with an event model for asynchronous IO.
Unlike javascript, there is still a concept of typed object, can distinguish different type of object in an array, and it's not completely type free like js.
And there is no garbage collector, but manual reference counting so memory remain always clean ( and strictly bound), and it's not focus only on green threading like js and doesnt require virtual machine as it's compiled to binary executable.
I think im more interested with emergent properties and allowing more flexibility in writing program and object interaction, even if the actual outcome cant be predicted :)
Unlike javascript, there is still a concept of typed object, can distinguish different type of object in an array, and it's not completely type free like js.
JavaScript has "TypedArrays" to allow fast unboxed types in arrays. TypeScript can restrict the types in normal JS arrays too.
And there is no garbage collector, but manual reference counting so memory remain always clean
Reference counting is a garbage collecting (http://onlinelibrary.wiley.com/doi/10.1002/spe.4380140602/abstract)
The kind of multi-generation mark-sweep GC used in JavaScript is faster than a reference counting garbage collector.
Reference counting is a garbage collecting (http://onlinelibrary.wiley.com/doi/10.1002/spe.4380140602/abstract) The kind of multi-generation mark-sweep GC used in JavaScript is faster than a reference counting garbage collector.
Yes GC is faster but tend to use more memory and memory bound are very loose, js vm can quickly eat up lot of ram and there is not much you can do about it. Here the memory is available once it's not referenced.
I agree GC are faster & all, but also they tend to not really free the memory a lot, especially with complex dynamic applications.
For me GC are optimal where there can be sweet spot for flush the GC, like in a video game when the level is removed, or in browser when the page is closed.
But for application like servers who need to run h24, there is not specially obvious sweet spot to flush the GC.
And i don't like the idea of java GC to fill up the memory before to start collecting the garbage, it's a bit dangerous, and often lead to app crash on android on phone with not a lot of memory. Then need to clean up memory manually by clicking the clean up button because it's full.
If the idea it's to fill up the memory until it's full and then the application crash yeah ok it's faster than reference counting i can easily understand why :p
Specially if it's to run on system without virtual memory, there is not the easy sweeping under the carpet of swaping out garbaged memory and having 2Gb of virtual memory in the swap.
JavaScript has "TypedArrays" to allow fast unboxed types in arrays. TypeScript can restrict the types in normal JS arrays too.
Also key members of objects can be hard typed, not only arrays.
Essentially hashmap are same as arrays in my system, and all entries can be typed, and have automatic safe serialization routines. And serialization/hashing of complex object (blocks->txs->inputs/outputs->script addresses etc) is kinda the hearth of blockchain operations :)
Having safe serialization/hashing function for objects is a must, or building merkle tree out of object arrays, and it's not that easy to have with js.
Like in the idea, you can do operation on anonymous object for example to get any member of the object that is a message list, and then do operation on this message list, without knowing anything about the object at all.
Any object can be considered as an array of typed key, and all object's and array's keys can be accessed by type etc
The GC in V8 JavaScript can mark-and-sweep incrementally to avoid pauses.
Using unboxed ArrayBuffer
can lower the load on the GC. This is one of the features I wanted to support as a high priority.
Like in the idea, you can do operation on anonymous object for example to get any member of the object that is a message list, and then do operation on this message list, without knowing anything about the object at all.
This sounds like a security hole. Static typing is important for security.
Designing and implementating a dynamic “framework” (as I hesitate to call it a language) is I think more expedient than designing and implementing a statically typed language with sufficient higher-order polymorphism.
The GC in V8 JavaScript can mark-and-sweep incrementally to avoid pauses.
Firefox's GC does not seem to be so clever, I get frequent pauses if I do not try really hard to reduce the amount of garbage generated.
This sounds like a security hole. Static typing is important for security.
Are you sure? It sounds a lot like existential types to me, where know we have a list of objects that all implement the message
interface, but we do not know anything about the type of each object (and in fact with an existential type, we cannot find out anything more about that type either).
Haskell syntax:
data MessageList = MessageList (forall A . Message(A) => [A])
Out Syntax (not agreed)
data MessageList = messageList(forall A . List[A] requires Message[A])
The GC in V8 JavaScript can mark-and-sweep incrementally to avoid pauses.
Firefox's GC does not seem to be so clever, I get frequent pauses if I do not try really hard to reduce the amount of garbage generated.
I was referring to Chrome’s V8.
I presume you can saturate it of course. I presume it tries to do incremental if that is plausible. But yeah I presume we need to minimize the load which is why I wrote:
Using unboxed
ArrayBuffer
can lower the load on the GC. This is one of the features I wanted to support as a high priority.
One can even compile C code employing Emscripten which malloc
s within an ArrayBuffer
heap.
Are you sure? It sounds a lot like existential types to me, where know we have a list of objects that all implement the
message
interface, but we do not know anything about the type of each object (and in fact with an existential type, we cannot find out anything more about that type either).
It depends on whether that blackbox we are calling has been constrained to not access APIs which do not want it to. Without static-compilation, we do not have a fine-grained sandbox (as I proposed by limiting which imports we give the compiled code).
So 100% dynamic defaults to the overall host language sandbox which is far too permissive to do any capabilities security.
GC like this can works well when memory is compartimentalized, and there is good sense of pointer ownership.
With js it's easy cause you can know easily when all variable from the à script is going to be un used, and there is no threading.
When objects can be passed around and referenced in different threads without sense of strong ownership, im not sure this kind of GC is that efficient.
This sounds like a security hole. Static typing is important for security. Designing and implementating a dynamic “framework” (as I hesitate to call it a language) is I think more expedient than designing and implementing a statically typed language with sufficient higher-order polymorphism.
Static typing is important for security if memory access depend on this static typing.
As all memory access are made knowing the dynamic type, there can be no security hole at all.
For me the idea is a bit similar to principle of godbel incompleteness, you can never have a single language who can be sound a consistent by itself.
The consistence of the program come from the mind of the programmer, not from the compiler.
If programmer want to screw up memory and write crap program who are inefficient and crash, they can always do it with any language.
The thing is i'm not really sure if it's really supposed to be considered high level language.
The thing that it's made in C is irrevelant, because the C compiler is not supposed to really understand the whole high level logic of the program.
It's a bit in the middle in between low level and high level, it's low level code who implement high level construction.
The only function of the low level code in the case is to provide abstraction for memory, threading, i/o, object hierarchy, and atomic operation on objects in parallel multi thread system.
Until some weeks ago, i didn't really try to build up high level language to represent those high level concept, but the code is already very layered, and the C compiler can't really understand much of what is going on in the top application level, it's mostly message handlers and dynamic object manipulation, and it rely on abstraction of high level concept, even if it's C code, the whole logic of the program doesn't rely only on C abstraction level.
It doesn't use one bit of the C runtime.
For the moment it just use stdio on linux because i'm too lazy to make the code to use directly the kernel level unistd function, but that's about it. On windows it use directly kernel API CreateFile etc. and i use kernel level api for sockets on linux & windows.
And that's the very low level thing, 95% of the program will never see a file handle or anything system specific, and it doesn't use any function of the libC / runtime anywhere.
Anything that is beyond libcon & the launcher rely entirely on high level abstraction, even if they are made in C, because i also want to have the low level / cpu / memory part fully in check for safe multi threading & certain things that are better done in assembler ( even if it's 1% of the program, and the end user never have to deal directly with this at all) .
For me most of what high level language do today is restricting program expression to simple case that avoid to deal with complex issue, and they are not even that good at doing this.
And my idea i really want to have a monad like concept, with object and operations that do not depend on context, and that operations can be done on any object regardless of who allocated it, who is using it, that all object are exactly identical to each other regardless of high level definition.
The goal is not necessarily to encourage 'bad design', but more that if people want to do good design, they can bother to study all the variable and sharing and all to optimize threading and parallelism, but if one want to be lazy and just scaling function with some shared object, that they don't have to bother about it even if there can be performance down side.
If they want max performance, they can always inline SIMD assembler or whatever and deal with all the memory and stuff themselve. It's still C in the end.
If they want lazy programming, they can just use the high level concept of nodes and object, and they don't have to bother about memory or threading at all. (the only thing that can affect the thread is atomic operation that are about one instruction wide).
It depends on whether that blackbox we are calling has been constrained to not access APIs which do not want it to. Without static-compilation, we do not have a fine-grained sandbox (as I proposed by limiting which imports we give the compiled code). So 100% dynamic defaults to the overall host language sandbox which is far too permissive to do any capabilities security.
With my system of binary modules, all the import can be checked, and it can only import symbols exported by other modules.
There is no way a binary module can get a direct pointer to any system function or a function exported in a dll any where on the system.
And it wouldn't be too hard to add some restriction on the functions that a module can import, to have sand boxing at binary level.
For the script, all is sandboxed, it can only access variables that are declared in the script globals or in the local function.
There is just a reference the node object added to all scripts, to access directly node variables to inject them as js var in the html page, and it can save some rpc call / ajax call to get dynamic data from the node into the page with direct access to the global variable.
Every script variable access can be controlled, including call to the modules api.
Capacity based security can be expressed with the script language abstraction, and having dynamic type doesn't mean you can't specify type at compile time.
If you create a node with compile time type, it will act like as a compile type, and the reference will always have this type.
Constants can be added at compile time, or in variables outside of the script scope that can be checked by underlying functions.
Scripts can't be modified at runtime, and a script can only access variable defined in its own file.
With the tree system, it's very easy to limit the scope of access of a script function to only childs of a particular object or node.
From the C code, i guess screw up can happen, but after it would be like running production apache server with binaries that you found on an alternative website, if there are glitch in the binary, it will screw up anyway, so if you want security, you can only run trust binaries, but that would be the case for any compiled language anyway.
With my system of binary module, the same binary module can be used on linux/windows so it's easy to just copy paste binaries from a trusted source and check they are the good modules.
After it's clear if you want to run any binary module you find on the internet, it can probably screw up, but i think my system would still be globally safer than system of dll, because at least they can't import directly any function outside of the modules, and only libcon contain access to the system. So by just preventing a module to import function from the system, it should be totally sandboxed from the system, and with position independent code, all memory locations can be randomized easily.
but in the end, most of the time, the point of doing capacity permission is checking authority to modify some data, and if this said data is to be stored on a blockchain at the end of the day, the integrity and authority on the information can be checked via the blockchain protocol, rather than relying only on the capacity checking at program execution level.
I described a system of capacity based permission quickly on devos forum some years ago, but it was before i got into blockchain, with blockchain i think it can simplify a lot this issue of data access security in the broad picture.
And for me static type for code executed as binary is very weak regarding security.
At cpu level, there is only process level granularity, any binary code loaded in the process memory can acess all the memory of the process, no matter if it's defined as constant, static, private, the cpu doesn't care.
And if you use only static typing compiled as dll, it will always be loaded at the same virtual address, and all the static variables will always have the exact same memory location.
It's very easy to inject a dependency in the process to make the system load some binary code in the process space, and then this binary code can access all the variable, classes and types at static location.
And i've been doing this with many commercial app, most of them use C or C++, hence static type, it's very easy to inject a dll in the process and access all the variable live ( see diablo hack for example of this).
All the high level definitions are only useful to check the security of program that are made with it, and from the moment you want to compile and run it as binary code, the whole memory space management is handled by the kernel, and all the memory structure for the statically typed static variable is defined at compile time, and will always be loaded at the same virtual address .
And the high level definition can check nothing from program that are not made with it. If you want to have some kind of RPC, and expose interface to client application, the high level language can check nothing for the client code, or format of parameter it will share with the host program.
If you want to be able to mix the compiled application with non trusted binaries, the high level definition doesn't matter all, all those abstraction are gone once the code is compiled to assembler, and the only granularity you have at binary level is process level.
Saying that having dynamic type is a weakness in security is like saying that having mutable variable is weak for security. Having dynamic variable in a program doesn't mean all data has to be dynamic either, it's same for types.
If you run only trusted code manipulating the object as a specific type, the object will never have another type.
If you want to run non trusted binary in the same address space, with static typing it's virtually impossible to prevent this binary code to access anything in the process, even without giving it the definition of the class or a pointer to anything, all the static variables are at static location, with a static type anyway.
And it's very easy to know the effect of overflowing this or that buffer, because all the variable are at static location in memory, so overflowing a buffer at a static location will always effect the same variable at the static adjacent position.
With the system of multi threaded double buffer monad, dynamic type, and position independent code, already you can say it's much harder to figure out the location of any variable at runtime, without even explicit randomization of memory access.
As all the loading of the binary code as position independent code, the relocation / export / import etc is made manually, it would be very easy to randomize the location of every single variable in the whole program at runtime.
If you use dynamic arrays or complex hierarchy of objects with dynamic type, it's very unlikely a variable will be loaded all the time at the same location, and hence it make buffer overflow much harder, added to the fact that all memory access are programmed using the tree system on dynamic type even at very low level, so all the code using this system is already pretty resistant to buffer overflow, and it would be very hard to figure out what variable is next to the other in this kind of context, without even explicit memory randomization.
Actually static type are always the security issue, if everything is dynamic, there is never any overflow, nothing is expected from the data at runtime, all operation even on objects are strictly memory bound, either they are the function to allocate the data, read it or write.
An access to a dynamic variable will never overflow to another variable.
@NodixBlockchain wrote:
When objects can be passed around and referenced in different threads without sense of strong ownership, im not sure this kind of GC is that efficient.
I do not want to get off on a long tangential discussion right now, but efficiency comes in many flavors. GC is more efficient in aiding rapid programming (not talking about runtime performance). Generally anything that escapes the generational GC is less efficient than had it not. RAII stack frame allocation and deallocation is probably more efficient than generational GC. But in general reference counting is not more efficient in every way than GC and it does still suffer domino effects causing high-latency stalls. However, reference counting deallocation is prompt and more deterministic than GC mark-and-sweep (although I rebutted there, “But please note that reference counting can’t break cyclical references but GC can. And reference counting can cause a cascade/domino effect pause that would not be limited to a maximum pause as hypothetically V8’s incremental mark-and-sweep GC.”). Incremental GC schemes when not overloaded with allocation that escapes generational GC, decrease high-latency stalls. For 100% real-time performance (i.e. no stalls), then hand-tuning of memory allocation and deallocation is probably needed. Memory leaks can occur with any of those techniques, but GC eliminates some cases of memory leaks. Here are other posts that discussed reference counting:
https://github.com/keean/zenscript/issues/17#issuecomment-313808940 https://github.com/keean/zenscript/issues/34#issuecomment-309801174
Afaik, threading has nothing to do with making reference ownership (for the purposes of deallocation not controlling shared access restriction) less deterministic. Perhaps you are thinking about for example browser integration with for example Java and Flash applets which may not be well integrated with a single GC instance.
@NodixBlockchain wrote:
Static typing is important for security if memory access depend on this static typing.
As all memory access are made knowing the dynamic type, there can be no security hole at all.
You appear to not even for example considering that security includes restricting access to certain APIs as I pointed out:
@shelby3 wrote:
It depends on whether that blackbox we are calling has been constrained to not access APIs which do not want it to. Without static-compilation, we do not have a fine-grained sandbox (as I proposed by limiting which imports we give the compiled code).
So 100% dynamic defaults to the overall host language sandbox which is far too permissive to do any capabilities security.
Assuming our dynamic language prevents access to global variables, we could restrict APIs by passing them as input arguments. But we then have no static checking on what those input arguments of the caller contain. We end up with instead some dynamic soup that can only be check with unit tests. Unit tests are not security.
Capacity based security can be expressed with the script language abstraction, and having dynamic type doesn't mean you can't specify type at compile time.
Is that like being only a little bit pregnant?
And for me static type for code executed as binary is very weak regarding security.
At cpu level, there is only process level granularity, any binary code loaded in the process memory can acess all the memory of the process, no matter if it's defined as constant, static, private, the cpu doesn't care.
A sandbox can be much higher-level than that.
For me most of what high level language do today is restricting program expression to simple case that avoid to deal with complex issue, and they are not even that good at doing this.
Sometimes a static type checker does get in the way of expressing complex algorithms. @keean is quite knowledgeable on PL theory and has a lot of experience implementing algorithms on different languages. We have had in depth discussions about typeclasses, HRT, HKT, and modules for example. I have a learned a lot and contributed my slant/insights as I learn.
The goal is not necessarily to encourage 'bad design', but more that if people want to do good design, they can bother to study all the variable and sharing and all to optimize threading and parallelism, but if one want to be lazy and just scaling function with some shared object, that they don't have to bother about it even if there can be performance down side.
If they want max performance, they can always inline SIMD assembler or whatever and deal with all the memory and stuff themselve. It's still C in the end.
If they want lazy programming, they can just use the high level concept of nodes and object, and they don't have to bother about memory or threading at all. (the only thing that can affect the thread is atomic operation that are about one instruction wide).
I will not speak for @keean’s opinion, but I know he and I have mentioned several times our agreement with the general principle to try not to have multiple paradigms in the same language, i.e. not multiple ways to do the same thing, if possible to avoid. Because readability of open source is a very high priority these days and the complexity budget is finite.
You appear to not even for example considering that security includes restricting access to certain APIs as I pointed out: Assuming our dynamic language prevents access to global variables, we could restrict APIs by passing them as input arguments. But we then have no static checking on what those input arguments of the caller contain. We end up with instead some dynamic soup that can only be check with unit tests. Unit tests are not security.
I don't see why dynamic typing prevent restricting access to any API.
You could define the methods or modules a certain script can call in the definition of the node, and then restrict script execution to those.
Same for variables.
If the api need to check credentials, there is in browser signature or sessions based on cookies, and then restricting access to certain function based on checking credentials.
The function can check if the input data contain the data it needs, with the type it needs, it doesn't remove security from static typing.
I think you are quite confused between the concept of data format (like network protocol), type, and interfaces/API =)
The goal of making interfaces/API is to provide methods to access conceptual property of an object without knowing its type or internal data format.
The goal of network protocol / data format as in serialization is to ensure objects can be transmitted over a network protocol between two programs, even if the representation of the object they have is different, or that they don't even use the same field of the network data.
The goal of object is to provide abstraction for data localization in computer program.
Data from network get serialized into object that are exposed via interfaces.
Implementer side of the interface can check if the parameters fit what it expect, but ultimately what matter in term of high level concept of security is not the format of the data, but the information it contain, it's why interfaces are useful to abstract the data format from the conceptual type manipulated in the program.
With json/rpc, from javascript all object are already dynamic anyway, so there is necessarily a step of checking the type of the input data.
With anything connected to internet anyway, you can never really assume much anything from any type or data coming from the network, everything need to be checked anyway, especially more true with blockchain where every packet could be anything from a spam/DOS, a valid block, with anything in between of orphan blocks, and all the mess, but dynamic type can allow more flexbility in what can be safely accepted, without removing any security.
PHP use 100% dynamic typing, i don't see why it make any security problem.
Even to break that down at fundamental level, credential checking is a 3 item thing :
the admin - definition of what is allowed or not the user definition of what he want to run A kind of script to check if what the client require match what the host allow.
When you think about it, it's exactly how bitcore script of checking transaction works. So you could easily make data on the blockchain to store the information a particular object is supposed to be able to do, or some kind of permission template, then a bitcore like script containing the script to match with an input credential, and the script can return true or false if the user have the required access with the provided credentials.
The only reason why you would want capacity based thing to me, is to be able to run non trusted code.
Like doing something operating system level thing to control access to local resource in environment where non trusted code can be run on the system.
If it's in the trusted code scenario, it's very simple, you don't want an API to have a certain access, you don't program the function in it , period =)
You don't want the server to be able to do certain kind of action, don't expose the interface to objects who does, that's an admin job.
It's same with any server side software, if you want to keep it safe, only install code on it who expose the function you want to allow.
If you want to install any kind of non trusted code on the server within the same domain etc, you're going to end in trouble.
It's same with script, if the script is trusted code, all the type manipulation can be made static too, and in the case, most of the types are defined at compile time, if the json-like string to create the object is a constant in the C program or the script, the object will always have the same type.
And that doesn't change anything with static type, in the context of web server, it's not your program who generate the requests, and they will be created most likely from language who already have dynamic typing.
The only substantial difference between dynamic & static typing, is that with static typing if the input doesn't match exactly the object definition it would fail, with dynamic typing, it will try to instanciate the whole thing no matter what until it run out of memory, and then check if it fit with what the inputs needs.
The protocol module in my system is what create the object template from the definition of network message, and the object will be instanciated from the data and passed to the message handler (either it's in C or script).
The base layer can only check if the size of the data match with the object instance size, and higher layer can check if the objects data match what it needs.
It can allow for unordered named parameters like AS3/js and json/rpc.
And the handler only need to check the object has the good named property who can be converted to the type it needs to make it's operation (in this i think it's close to haskell principle with monomorphized function).
I will not speak for @keean’s opinion, but I know he and I have mentioned several times our agreement with the general principle to try not to have multiple paradigms in the same language, i.e. not multiple ways to do the same thing, if possible to avoid. Because readability of open source is a very high priority these days and the complexity budget is finite.
Normally they should only be doing things with the node / monad or the framework system, but the C is already there, and well documented, so it can be used too.
But in the idea, in the future, most of the high level things should be done via the script / high level language, and then all is unified around the principles of message lists / event handlers and dynamic objects.
Script variable resolution is always limited to a particular root node exclusive to the script where all the global variable of the script are, and to the variables in the function definition (aka mostly the input parameters, or the output buffer and http infos for page script).
All the variables a script can access need to be child of the script root node, or child of the function definition node.
Afaik, threading has nothing to do with making reference ownership (for the purposes of deallocation not controlling shared access restriction) less deterministic. Perhaps you are thinking about for example browser integration with for example Java and Flash applets which may not be well integrated with a single GC instance.
Allocation / Deallocation is (non atomic) shared access to the memory pool / heap.
Hence why need often need to be careful with allocation in interrupts or such thing sometime, cause it can trigger dead locks.
But as i want to keep my system as lockless as possible, it mean a GC can only free memory allocated in the same thread it's running into.
In multi thread environment, GC flush would probably need synchronization primitive with all threads when it need to flush all the reference and memory, and checking in real time which memory is used or not.
It's much simpler with javascript like language in single thread, because you know all the application is stopped while the gc is flushing, and all memory not referenced at this particular point can be freed safely cause there is no other code that can be potentially manipulating this references in the same time.
But the tradeoff with my system is that it's based on lockless internal allocator with free area stack, so it's very fast to allocate / free memory, and all references can be shared between threads with lockless atomic reference counting.
The only thing that can't be done for the moment without lock is a thread freeing a memory allocated in another thread.
I have the code for doing this, there are comments in the memory allocation code to aquire/release semaphore primitive, so in theory, just need to uncomment those, or replace them with synchronization primitives like semaphore, and then the memory allocation become thread safe (with a semaphore).
But other than this, allocation and freeing of memory is lightning fast and completely lockless.
There is only one call to system allocation memory / thread at initialization, and that's it, after all allocation is lockess based on free stack, and super fast.
It could even do live memory defragmentation as applications only manipulate references, all the instances can be relocated transparently from application.
And there is very useful feature to track memory leak, such as displaying all memory allocated since last time, and can track all objects and memory that is newly allocated at some point, to track when object are leaked, can know which object it is, and the data it contain, which can help tracking most memory leak in minutes.
But i think there is still something similar in the principle to the mark & sweep thing, i use it for multi thread message passing, as thread can't free memory, a thread can't just push a message to the list and forget about it.
So when threads are done processing a message, they set the message as 'done', and the thread who push the messages in the queue periodically flush the list from message with the done flag, and if it's the last reference to the message it will be freed locklessly by the thread who created it, so i guess in the principle it's not far the mark & sweep.
It's also used in the block synchronization, because block packets can arrive in any order, sometime a block can't be processed before latter blocks arrive in the wrong order compared to the blockchain order, so there is a system to keep such message in the processing list, and all message beyond a certain age are wiped from the list regulary.
In the absolute it's not hard to impement mark & sweep kind of algorithm with the tree system.
I don't see why dynamic typing prevent restricting access to any API.
It is as if you did not comprehend what I wrote about unit tests. Dynamic typing makes no assurances until runtime. And due to the Halting theorem, we can not prove all runtime scenarios have been accounted for.
I think you are quite confused
Nope.
The goal of making interfaces/API is to provide methods to access conceptual property of an object without knowing its type or internal data format.
:-1:
You do not seem to understand that modularity is all about static types. But I am not going to have this sort of religious discussion. You are free to believe what you want to believe.
You may be conflating encapsulation and typing, which are separate concerns.
With anything connected to internet anyway, you can never really assume much anything from any type or data coming from the network, everything need to be checked anyway
Secure deserialization of typed objects is not a valid argument against static typing. The static types are enforced by the deserialization, which enables guarantees that are not plausible with unit tests.
Allocation / Deallocation is shared access to the memory pool / heap.
Irrelevant to the context we were discussing.
rrelevant to the context we were discussing.
If you are discussing why threading affect how GC can free memory or not, then it's relevant.
It is as if you did not comprehend what I wrote about unit tests. Dynamic typing makes no assurances until runtime. And due to the Halting theorem, we can not prove all runtime scenarios have been accounted for.
They can make the assurance that the program will not crash, and always return meaningful result based on dynamic data.
All the polymorphic scenario at the end of the day are monomorphized function that are known at compile time, so the whole code to access dynamic type could be inlined in the program, to make sure the the condition are met with the input data.
If the input data is invalid, then it will return an error, there is not much else to do.
If the "monomorphized interface" to the input data can succeed at getting the value in the type required to the code, then the input data is valid, and the operation is made.
If the monomorphized function fail to convert the input data to the type required by the code, then the function will fail and return an error.
I don't see what kind of assurance you can have at compile time about the data contained in packets coming from the internet.
You do not seem to understand that modularity is all about static types. But I am not going to have this sort of religious discussion. You are free to believe what you want to believe. You may be conflating encapsulation and typing, which are separate concerns.
You don't seem to understand that having dynamic typing doesn't mean you can't have static typing.
It's like saying having non constant variable in a program make it insecure because you can't know all the value of the variables at compile time.
If the type is defined statically at compile time, then it's static and known at compile time.
If an object is allocated from runtime type definition, then yes the type definition can't be know at compile time. Which is the case for all json/rpc based request.
If you want condition that can be checked at compile time, then don't allocate object with a type using a runtime definition.
But then you can't parse json/rpc request based on javascript dynamic objects. As you don't know the definition of those object at compile time because they are generated at runtime by javascript, or python, or AS3, which use dynamic type.
And in that case, static typing is just a burden from the client language perspective who already format his message based on dynamic object definition.
The only real use case i see for 100% runtime defined dynamic type, outside of json/rpc communication with dynamically typed language, is for example to design quickly network protocol based on script definition.
In the process of designing the protocol, just need to update the type definition in the script, and all the serialization/ deserialization process, as well as the script code is based on this type definition, so everything can still be checked for consistency without unit test or anything.
But it's still more in the idea to be able to edit type definition from script without having to recompile anything, and then just need to copy this definition to all node, and all the network protocol is updated without you have to recompile a thing.
But it's still in the idea to be able to check if the code based on the type definition is sound.
Even if the actual object will be created based on a runtime variable. the definition of the object can still be known before it's run, and all the access to object member can be matched with the type definition.
In case of object that will be manipulated by trusted code, aka via generic code using the script definition, it can check at each point that the data fit the expect definition, and that the access to the object in the script code correspond with the type.
From the C point of view, it's dynamic typing, but from the script / node high level point of view, it's static typing, as the type definition won't change for the whole life time of the application, neither the serialization / deserialization routines, or the code to manipulate those objects.
But the whole definition can be changed via simple text edition, and then translated to json format for web, and manipulated as tree of dynamically typed object from C, and serialized to binary format according to network protocol specification if the serialization is more complex than just concat all object member to a binary stream.
What you are describing is a parser. However we can statically type the output of a parser by requiring it to confirm to a known interface. As such parsers produce existentially quantified types as their output.
Already i think need to distinguish types that are 'leaf', which are node containing actual data that can be used in an algorithm, and node that doesn't contain data, but list of named/typed child nodes who contain the actual data.
It's similar in the concept to XML nodes who can either contain text or other childs, except data can be typed and not only text.
In case of leaf node, the type shouldn't matter as the functions to access leaf node data are already monomorphized based on the type that need to be used in the code.
So the compiler doesn't need to know the type of the leaf node to know if the code will be valid, because no matter what the node type is, it will be able to output the data as integer, string, float, hash, or whatever is needed in the algorithm.
The type that need to be used in a particular function is always known at compile time, so the particular function to access the leaf node can be used to automatically convert the data to the required type.
It just add a new possible state for a variable access like javascript 'undefined' type, to indicate the variable name cannot resolve to an existing object or leaf node.
It should be taken in account in the code to manipulate objects if the code is to be considered 100% safe using dynamic type.
All function to access nodes data/childs can return 0 or 1, if they return 0 it mean it failed, and whatever value they were supposed to return is not initialized.
It's why i also don't like too much c++ operators because they can't really be checked for failure based on return sate, only with the exception mechanism, they can't easily deal with the case of operator being called with uninitialized/ non-allocated instance pointers.
Object instances are mostly a collection of leaf nodes, and are mostly existential to be able to execute certain function or filter on certain object in a list based on their type.
They are mostly existential type for certain message or object contained in a list that need to be passed automatically to the good handler who will know what to do with the whole object.
As i made a system to be able to evaluate node based on their child member value, like
eval(myObject,"height<8");
I can easily have the synthax to register an handler to a message list that will be triggered for any message whose selection expression evaluate to true. I didn't do the operator to evaluate the type of the object, but should not be hard to do. I already have the operator to evaluate object length if it's an array.
So i can have a synthax like map to process list of object with the function being selected based on dynamic expression evaluation ( do this with c++ :D )
But not all values are valid. Take Unicode strings, some values are not allowed. Also you may want a non-zero number (to avoid division by zero). In the general case you have to check every value and the relationship between them when they enter the system from an untrusted source. My choice is to always treat communication with other processes as a system edge, and parse the data.
There is already an utf8 decoding in the json parser.
But also the thing is as originally it was still made with the idea to be run on bare metal micro kernel, the global architecture is still oriented around concept of 'rings', and different module who are conceptually part of a ring, and data need to be checked before it cross a ring boundary.
I had this kind of discussion on devos, and there is always certain point where the code need to assume the input data is in a certain format, aka input to kernel modules for ex, and you can't have data/type checking in every single function call, so function inside of a certain ring are already supposed to check the data they send to other function in lower / same ring.
In the idea there is more or less 3 level
The libcon is equivalent to kernel level, and those function don't check anything on their own, but they are not supposed to be called from high level interface without parameter check.
The application module is equivalent to system function, and they are supposed to make the call to the libcon, and checking parameters they send to the libcon / system.
The top level modules are the one exposed to the http interface for RPC/CGI, and those check the user data input, and then make call to application modules with the checked input, and the application modules make call to libcon / system checking their input parameters.
There is no direct strict rule to enforce this, it's more conceptual, but as far as i know, there is not really a way to have this concept of 'ring' and module who belong to a certain ring, in order to know where the data checking must be done, when a module from a ring call module in a lower ring (with more permission).
I guess some kind of automic filtering of function parameters , or of any data a function is supposed to operate on could be done based on the type definition of the object and the function, but it seem a bit tedious to need to define this for every function,
But i guess it's there that compile time check can be useful to automatically detect if certain function code have implicit restriction on type or range due to the kind of operation they are doing, and inserting automatically the checking / filtering at ring boundaries based on this api definition.
But in general, if i know in the code there can be potential value who can trigger exception or such, i will generally do this check statically at function level, unless it's a test that can be long and that you only want to do once, and then subsequent call must only be made with valid data, and module beyond a certain ring will only expect already valid data.
Yes, so you need a parser/runtime type check whenever you cross a Ring boundary (which includes sending data to a different computer). All other type checking can be done statically.
To facilitate this inter-ring communication can take place over 'channels' that implement 'protocols'.
The type system to correctly type a protocol is much more sophisticated than most language type systems. Effectively you have to specify the types in every message (a data struct) and then specify which messages can be sent in what states of protocol state machine. There is a separate state machine for each computer taking part in the protocol, and we need to cope with them going out of sync, missing messages etc.