Amber icon indicating copy to clipboard operation
Amber copied to clipboard

[Feature] Dictionary Data Type

Open TheMayoras opened this issue 1 year ago β€’ 24 comments

The data-types in Amber could be even better if they were extended to include a dictionary/map type. Something like [Text, Num].

TheMayoras avatar May 21 '24 18:05 TheMayoras

That is definitely something worth exploring. It would be even better to make it conform to JSON format - this way we wouldn't need to do conversions of the type... although I'm not sure if this would be the best idea performance wise....

Ph0enixKM avatar May 21 '24 18:05 Ph0enixKM

That is definitely something worth exploring. It would be even better to make it conform to JSON format - this way we wouldn't need to do conversions of the type... although I'm not sure if this would be the best idea performance wise....

perhaps consider some js-like syntax?

let obj = {
    foo: "bar"
}
echo "foo is " + obj[foo]
echo "0 is " + obj[0]

then i guess it should compile into something kind of like this

obj_k=( foo )
obj_v=( bar )
get_obj_by_key() {
	for i in "${!obj_k[@]}"; do
		if [[ ${obj_k[$i]} == "$1" ]]; then
			echo ${obj_v[$i]}
			return
		fi
	done
}
get_obj_by_index() {
	for i in "${!obj_v[@]}"; do
		if [[ "$i" == "$1" ]]; then
			echo ${obj_v[$i]}
			return
		fi
	done
}

echo foo is $(get_obj_by_key foo)
echo 0 is $(get_obj_by_index 0)

not sure how to handle nested objects, though

b1ek avatar May 22 '24 02:05 b1ek

perhaps consider some js-like syntax?

I think that JS-like syntax here is on spot. Having to support more complex data could be harder in bash. We can discuss it on the community discord server

Ph0enixKM avatar May 22 '24 10:05 Ph0enixKM

i feel like we should decide on the syntax we want to adopt.

imo there are 3 ways it could go:

  • create a dynamic, JS-like structure
    • create a wrapper around jq? im not sure its a good idea tbh, as it is not cross platform by any means and probably slower than native bash implementation
    • use bash's linked arrays? that would still be very portable, but wouldn't support VERY old systems (that shouldnt be supported tbh, updating bash is not at all complicated)
  • object-ify amber by adding custom classes, types, etc
    • not sure how this will even work

also another topic for discussion is how we are going to handle serialization to/from JSON and other types. maybe create an external package like serde for rust?

b1ek avatar May 30 '24 05:05 b1ek

Bash arrays have a drawback of slowing down as their size increases. Using files, we could implement something like this.

// Amber
let obj = {
    foo: "bar",
    baz: "quz",
    fruits: {
      orange: 1,
      apple: 3
}
echo "foo is " + obj["foo"]
echo "0 is " + obj[0]
echo "apple is " + obj["fruits"]["apple"]
#!/bin/sh

get_obj_by_key() {
	cat "$1" |
	grep "$2" |
	sed -n '$s/^'"$2"'="*\(.*[^"]\)"*/\1/p'
}

get_obj_by_index() {
	num=$2
	line=$((num + 1))
	cat "$1" |
	sed -n ''"$line"'s/^[^=][^=]*="*\(.*[^"]\)"*$/\1/p'
}

obj=$(mktemp)
echo "foo=\"bar\"" >> "${obj}"
echo "baz=\"obj_baz\"" >> "${obj}"
echo "fruits_orange=1" >> "${obj}"
echo "fruits_apple=3" >> "${obj}"

echo "foo is $(get_obj_by_key "${obj}" foo)"
echo "0 is $(get_obj_by_index "${obj}" 0)"
echo "apple is $(get_obj_by_key "${obj}" fruits_apple)"

arapower avatar May 30 '24 09:05 arapower

I agree that adding jq as another dependency is hard to swallow and going for an easy route. The solution to use linked bash lists is pretty cool and more performant but this could be a more challenging to implemtent. I think that the alternative to use temporary is also really cool as it introduces a backwards compatibility. We will have to write some functions in bash hardcoded to the header and plan a way to store data of the Object type in the files though.

Screenshot 2024-05-30 at 20 33 39

I think that I like the mktemp version a little bit more. What do you think @b1ek @arapower @boushley @TheMayoras?

Ph0enixKM avatar May 30 '24 18:05 Ph0enixKM

About the mktemp version... couldn't that also be a non-temp file, thus giving us a persistent key-value datastore, akin to Python's shelve? (Though for small amounts of data.) And secondarily, would it be conceivable then to support TOML as the format, which would seem to allow multiple sets of key-value pairs in the same file? (Maybe that's going too far but it seems like being able to read TOML files would be good, and could dovetail with the other key-value functionality.)

garyrob avatar May 30 '24 18:05 garyrob

mktemp

are you sure you want to rely on temporary files? afterall, the script's user might not have the permissions to do that, and it is awful from a security perspective - a third program can easily modify the script's memory

what if we used bash's variables instead of files? that seems pretty much doable

b1ek avatar May 30 '24 23:05 b1ek

the script's user might not have the permissions to do that

It cannot be denied. However, this also applies to commands like sed and bc that Amber already depends on.

it is awful from a security perspective

I consider the security of the permissions for files created by the mktemp command, which are set to 600, to be high.

a third program can easily modify the script's memory

This risk is about the same for general programs or shell scripts that create temporary files.

what if we used bash's variables instead of files? that seems pretty much doable

Your previous post mentioned the following:

not sure how to handle nested objects, though

If you have any ideas for a clever implementation using variables, an example would be greatly appreciated.

Handling large amounts of data without using temporary files can also be difficult. I think implementing a solution that makes appropriate use of external commands to enhance compatibility with sh and similar shells would be easier than relying solely on Bash features.

In the future, there may be cases where temporary files (or directories) are used when implementing other features. So, it would be beneficial to consider now the proper handling of temporary files.

Since mktemp is not included in POSIX, there may be environments where it does not exist. In such cases, you can refer to implementations like the following:

  • https://github.com/ShellShoccar-jpn/misc-tools/blob/master/mktemp

arapower avatar May 31 '24 11:05 arapower

i've spent this weekend implementing different appoaches to objects in bash, trying to get as close as possible to something like objects in actual dynamically typed languages.

i dont think that we could do much with implementing this thing. like, we are pretty much limited by bash. maintaining our own file specification is overkill, not to even mention how we are going to handle escaped strings and nested objects, and how is this going to affect code readability + emitted program size.

someone has mentioned linked arrays in bash. they do not exist in bash that comes with all macos's and cannot be nested, or passed to a function.

like, the best we could do is to depend on jq and store it in string variables or temp files. anything else is either awfully unportable and very limited, or will take an incomprehensible amount of effort to implement.

b1ek avatar Jun 02 '24 09:06 b1ek

I think it would be good to implement it with jq command. Shell variables would be fine for data retention. If we implement the various Amber functions that manipulate data in JSON format, the functionality we need will naturally become clear. Then we may decide to implement new functions or possibly reduce the jq dependency.

arapower avatar Jun 02 '24 13:06 arapower

I think it would be good to implement it with jq command.

just to make sure, we are going with this?

also we might consider this: https://github.com/kristopolous/TickTick

@Ph0enixKM @boushley @brumik what do you think

b1ek avatar Jun 08 '24 14:06 b1ek

We have a couple of routes at this point:

jq route

This is the easiest one. We'd just use the jqand call it a day. This adds requirement for user to also install jq in order to do operations on collections and dictionaries.

Using Bash's 2.0 native structures

We'd have to use hacky ways to get around of some limitations. The dimentional arrays could be solved by using array with linking to variables

# Amber: let arr = [[1, 2, 3], [4, 5, 6]]
arr0=(1 2 3)
arr1=(4 5 6)

arr=(arr0 arr1)

# Amber: echo arr[0][1]
eval echo \${${arr[0]}[1]}

Idk if this is example breaks. It probably does. We'd have to test this thoroughly.

Bumping requirement for Bash to 4.0

This is a bummer since this way we drop the support for macOS (unless we make it work with zsh) and some distros.

The zsh uses a pretty similar syntax although I wouldn't go for is since this would introduce a disambiguity for some packages that people create with Amber.

Building our own implementation of storing object in some way

@arapower had an idea to use temporary files to store data. How about using just variables and keep the data relatively easy to parse? This way we can keep things fast and also maintain the backwards compatibility. We could store not only objects but also lists (and perhaps some other data types as well)

Ph0enixKM avatar Jun 08 '24 16:06 Ph0enixKM

@b1ek Sorry to come back late. I had to do some small research on my own.

Here are my two cents: I think Amber has to decide what it wants to be. As far as it was going until now it was a wrapper around bash for people who already do scripting in bash, for systems that support bash.

With this in mind I do not think it is absolutely necessary to have nested arrays, objects and things that are impossible (by default at least) in bash.

I can also see issues with creating a temp file in performance too (and completely agree with @b1ek about security permission and immutable system problems).

If I would suggest something probably would be the fact to not to support dictionary type, only arrays (which can be done with simple constants). This seems like it would be an issue for some, but overall more healthy for the project and the expectations. I really think it is important for users and developers to define the scope of the language. For cross platform programming (not scripting) there are other languages (like C and rust for example).

brumik avatar Jun 09 '24 06:06 brumik

Thank you @brumik for your insight. πŸ‘ As we've been discussing this issue some other idea arouse https://github.com/Ph0enixKM/Amber/issues/161. We could build a runtime that can get fetched (if not exited) and would extend Amber for more functionality. I think that letting Amber be a shell language and yet letting users use an extension if required needed for their needs. Perhaps they have already built something with Amber and the need just that one little thing that is not really supported by Bash but is pretty common in other programming languages.

But honestly I think that ultimately the best way would be to just utilize the Bash's features as well as possible and perhaps provide some other functionalities as a form of a library.

Ph0enixKM avatar Jun 09 '24 10:06 Ph0enixKM

@Ph0enixKM Wouldn't it be better to go beyond data types and discuss the direction of language design?

arapower avatar Jun 09 '24 11:06 arapower

@Ph0enixKM Wouldn't it be better to go beyond data types and discuss the direction of language design?

Yes. But not in the scope of this issue. Let’s create a discussion for that.

Ph0enixKM avatar Jun 09 '24 12:06 Ph0enixKM

Just a minor comment, possibly moot, about the jq possibility: for some people, it's nontrivial to install on MacOS: https://stackoverflow.com/questions/71406984/how-to-instal-jq-without-homebrew

garyrob avatar Jun 22 '24 08:06 garyrob

For jq as dependency we have a @b1ek bash project that we have to integrate so it is a complete different issue.

Mte90 avatar Jun 24 '24 09:06 Mte90

For jq as dependency we have a @b1ek bash project that we have to integrate so it is a complete different issue.

the problem remains though: it is not available on all systems

b1ek avatar Jun 25 '24 07:06 b1ek

I think that for any tool we will use in the Bash generated that can be jq or curl this dependency checker does on every run a check if the various commands exists and in case report an error.

After all in this way it is the same feature that other scripting languages have, only that in the Bash case maybe the stuff/commands already avalaible are less.

Mte90 avatar Jun 25 '24 08:06 Mte90

After all in this way it is the same feature that other scripting languages have, only that in the Bash case maybe the stuff/commands already avalaible are less.

I don't understand what you are trying to say here. I think that Amber should not depend on the jq as it adds more dependencies. We could later on implement some standard library function to parse the JSON format. But that's just an idea.

Ph0enixKM avatar Jun 30 '24 17:06 Ph0enixKM

Parse JSON in pure bash it is something that I don't like at all, I think that we should use tools if they are there otherwise there is an error about the script can't run there. There are various jq alternatives anyway and like we did it in my PR with new commands for download we can do a wrap around them.

Mte90 avatar Jul 01 '24 08:07 Mte90

https://github.com/h4l/json.bash

It is a project to manipulate JSON in pure bash, so we should start thinking to create a system to embed pure bash libraries. Maybe in the future those lbiraries will be migrated to Amber.

Mte90 avatar Jul 04 '24 09:07 Mte90