Vyxal icon indicating copy to clipboard operation
Vyxal copied to clipboard

Vyxal Corpus

Open ghost opened this issue 3 years ago • 3 comments

Similar to this issue opened in the Jelly repo and this issue opened in the Husk repo, I decided to run Lynn's method on 05AB1E answers (with a few modifications).

SEDE query used to get all Vyxal answers on CGCC. Code used to analyze the data. Final results

ghost avatar Apr 09 '21 13:04 ghost

Results as of January 29, 2023:

2-graphs:
  41 \|
  27 (n
  27 \_
  25 \\
  24 +,
  24 =;
  23 ,)
  22   
  22 ka
  22 ` 
  22 ;f
  21 *\
  20 :£
  20 ?(
  19 øm
  19 `\
  19 ;h
  19 ++
  18 =[
  18 `|
  18 \/
  17 , 
  17 2ẇ
  17 vL
  17 2l
  16 *+
  16 ÞT
  16 12
  16 ƛ?
  16 ‛ 

3-graphs:
  11 \_*
  10 `:q
   9 *\|
   8 :qp
   7 ĠvL
   7 +,)
   7 \|+
   6 \\`
   6 \|:
   6   #
   6  ma
   6 2lv
   5 ;ȯt
   5 »₄τ
   5 ++,
   5 ⁰nε
   5 ⇧\_
   5 ð*\
   5 \\꘍
   5 \++
   5 `| 
   5 ?(:
   5  # 
   5 # m
   5 mai
   5 ain
   5 in 
   5 n p
   5  pr
   5 pro

4-graphs:
   8 `:qp
   5 ⇧\_*
   5   # 
   5  # m
   5 # ma
   5  mai
   5 main
   5 ain 
   5 in p
   5 n pr
   5  pro
   5 prog
   5 rogr
   5 ogra
   5 gram
   5 2lvƒ
   4 ʀʁɾɽ
   4 :q$+
   4 *\|+
   4 \|:„
   4 |:„+
   4 :„+p
   4 „+p,
   4 *ðp,
   4 :qpq
   3 ‛, j
   3 t's 
   3  I'm
   3 I'm 
   3 1234

Notes:

  • All the single characters are ASCII-art related
  • +,, p,, and++ are two common operations joined together
  • øm and ÞT are deprecated in favour of 1-byte alternatives
  • ð* is now I
  • v∑ is now
  • $i is rarely required anymore as i now indexes the number into the list/string
  • (n, ?( and =[ are just generally useful operations

  • \|, \\ and \/ look like useful enough nilads
  • ⁰nε is absolute difference with input and context variable. or string joining. why?

lyxal avatar Jun 23 '21 06:06 lyxal

import csv
import collections

digraphs = collections.Counter()
trigraphs = collections.Counter()
quadgraphs = collections.Counter()

with open("QueryResults.csv", newline="", encoding="utf-8") as f:
    for row in csv.reader(f):
        if row[0] == "Post Link":
            continue
        code = row[1]
        if "<pre><code>" not in code:
            continue

        # Extract the first bit of code
        vyxal = (
            code.partition("<pre><code>")[2]
            .partition("</code></pre>")[0]
            .strip()
        )
        vyxal = vyxal.replace("&quot;", '"')
        vyxal = vyxal.replace("&gt;", ">").replace("&lt;", "<")
        vyxal = vyxal.replace("&amp;", "&")

        if any(vyxal.count(c) >= 10 for c in vyxal):
            continue
        if len(vyxal) > 100:
            continue

        for line in vyxal.split("\n"):
            for (a, b) in zip(line, line[1:]):
                digraphs[a, b] += 1
            for (a, b, c) in zip(line, line[1:], line[2:]):
                trigraphs[a, b, c] += 1
            for (a, b, c, d) in zip(line, line[1:], line[2:], line[3:]):
                quadgraphs[a, b, c, d] += 1

with open("most-common.txt", "w", encoding="utf-8") as f:
    f.write("2-graphs:\n")
    for d, n in digraphs.most_common(30):
        f.write("%4d %s\n" % (n, "".join(d)))

    f.write("\n3-graphs:\n")
    for d, n in trigraphs.most_common(30):
        f.write("%4d %s\n" % (n, "".join(d)))

    f.write("\n4-graphs:\n")
    for d, n in quadgraphs.most_common(30):
        f.write("%4d %s\n" % (n, "".join(d)))

lyxal avatar Nov 21 '21 12:11 lyxal

Modified corpus that uses the lexer to get least/most common elements. Results:

Least common:
   1 X
   1 ¢
   2 ↑
   3 @
   3 Ẇ
   3 z
   3 ė
   3 ↓
   4 P
   4 ⟩
   4 ⟨
   4 ¼
   5 ƈ
   5 ḃ
   5 ⟇
   5 Ż
   6 ŀ
   6 §
   6 ∪
   6 ṙ
   7 □
   7 Ṁ
   7 ǒ
   7 Ǒ
   7 Ȯ
   8 }
   9 ∇
   9 ɖ
   9 ⊍
  10 ⋎
  10 ‟
  10 ₆
  10 ⋏
  10 H
  11 !
  11 ≥
  11 M
  11 ⟑
  11 ₈
  12 ḋ
  12 ¶
  12 ≤
  12 ṫ
  12 Ċ
  13 ↲
  13 ǎ
  13 ṁ
  13 „
  13 ^
  13 ∵
  14 ċ
  14 Ǎ
  14 ꜝ
  14 ₇
  14 q
  14 ∴
  14 ₅
  15 Ǐ
  15 Ȧ
  16 Ǔ
  16 ₄
  16 B
  17 Ḟ
  17 ¤
  17 ±
  17 ∨
  18 ¾
  18 F
  18 †
  18 €
  18 Y
  19 &
  19 ∧
  20 ₂
  20 ¡
  20 ⌐
  20 ɽ
  20 ġ
  20 m
  21 ↳
  21 β
  21 ǔ
  21 x
  21 ß
  22 ↵
  22 _
  23 o
  23 ∞
  23 ₁
  24 ǐ
  24 j
  24 ≠
  25 ṗ
  25 ∷
  25 ḣ
  26 •
  26 r
  26 æ
  26 √
  26 ṡ
  27 Z
  27 w
  28 ż
  28 E
  28 ⅛
  29 V
  29 µ
  29 ȧ
  29 Ė
  29 y
  29 ₃
  30 ẏ
  30 ƒ
  30 ḭ
  31 Ŀ
  31 ≬
  31 ʀ
  31 >
  31 ⇩
  31 O
  31 ×
  31 ₍
  32 <
  33 ∩
  33 u
  34 ḟ
  34 Ġ
  34 K
  35 a
  35 Ḋ
  35 T
  35 ẋ
  35 …
  35 Ẋ
  35 {
  36 ⁽
  37 D
  37 ‡
  38 S
  38 g
  38 ¬
  38 Ḃ
  38 ʁ
  38 ¦
  38 G
  39 ⁼
  39 W
  40 ⌈
  40 ≈
  40 ₌
  41 ]
  42 ~
  42 ₴
  42 I
  43 ¯
  43 Ṡ
  44 Ṗ
  44 ¹
  44 ℅
  45 l
  45 ε
  45 ²
  46 R
  46 İ
  47 ȯ
  48 ⇧
  48 Ṫ
  49 ₀
  49 b
  50 ẇ
  50 ⁋
  50 ↔
  50 N
  50 £
  51 Ẏ
  52 Π
  53  
  56 ꘍
  57 A
  57 τ
  57 Ṅ
  58 ⌊
  59 U
  61 "
  62 Ḣ
  62 %
  62 λ
  62 ð
  63 e
  63 c
  63 ½
  71 i
  73 ÷
  74 s
  74 ¥
  75 /
  78 C
  82 )
  87 |
  88 d
 102 p
 104 ɾ
 104 ‹
 105 t
 108 ⁰
 109 ṅ
 111 [
 111 -
 118 Ṙ
 119 h
 123 '
 126 ,
 126 J
 134 =
 140 (
 142 L
 147 ›
 161 n
 179 f
 209 ƛ
 217 ∑
 224 $
 231 ?
 246 +
 269 *
 278 ;
 281 :
 302 v
Least common digraphs:
   1 øḋ
   1 øp
   1 ∆L
   1 ∆ǐ
   1 ¨^
   1 Þ¾
   1 k¹
   1 øR
   1 ∆q
   1 k1
   1 Þ:
   1 ∆/
   1 ∆e
   1 ø↳
   1 kḭ
   1 Þm
   1 ∆ṁ
   1 ∆p
   1 ¨*
   1 Þe
   1 øḞ
   1 ÞḊ
   1 ∆T
   1 Þ∪
   1 ÞḞ
   1 Þ∴
   1 Þ∵
   1 øC
   1 ÞI
   1 Þż
   1 ∆F
   1 k½
   1 ki
   1 kg
   1 ød
   1 øÞ
   1 ¨»
   1 ∆₌
   1 Þ/
   1 ∆ė
   1 Þ↑
   1 kj
   1 Þ*
   1 kp
   1 Þo
   1 øβ
   1 ∆Ċ
   1 kV
   1 kW
   1 k<
   1 kḂ
   1 k[
   1 kɽ
   1 ø`
   1 k§
   1 k\
   1 øT
   1 k…
   1 ∆*
   1 ∆%
   1 ¨U
   1 k¦
   1 kṠ
   1 kṀ
   1 kḢ
   1 kτ
   1 kε
   1 ∆ƈ
   1 ÞU
   1 ∆ṗ
   1 Þ!
   1 kD
   1 Þ…
   2 ∆M
   2 k□
   2 ¨²
   2 ÞȮ
   2 kð
   2 ÞR
   2 ÞM
   2 øĖ
   2 k₁
   2 Þḋ
   2 øɽ
   2 k-
   2 øŀ
   2 ∆I
   2 øṗ
   2 øl
   2 ∆Ė
   2 k^
   2 ∆i
   2 øM
   2 øB
   2 Þẇ
   2 ∆o
   2 ÞZ
   2 k4
   2 Þṁ
   2 k+
   2 øb
   2 kṡ
   2 Þ₀
   2 ø∆
   2 kL
   2 ∆d
   2 ∆Ŀ
   2 øo
   2 øṀ
   2 kz
   2 ¨…
   2 øe
   3 ¨V
   3 ÞG
   3 Þj
   3 Þx
   3 ÞK
   3 Þ⇧
   3 Þ□
   3 Þg
   3 ø^
   3 k⁰
   3 Þ℅
   3 Þǔ
   3 k≈
   3 Þ×
   3 k•
   3 ÞṪ
   3 k×
   3 ∆Ṙ
   3 ∆τ
   3 ∆Q
   3 kF
   4 Þ⊍
   4 ¨p
   4 ø∧
   4 ¨2
   4 ÞṠ
   4 Þp
   4 øĊ
   4 ¨M
   4 k(
   4 ÞD
   4 kl
   5 ¨£
   5 ¨=
   5 Þṡ
   5 ∆f
   5 øṘ
   5 Þ•
   5 kv
   5 k6
   5 kB
   6 øA
   6 ÞẊ
   6 øṖ
   6 k2
   6 kh
   6 k∨
   6 k/
   7 ∆ċ
   8 ∆Z
   8 øṙ
   8 kr
   8 ÞF
   9 kP
   9 ∆²
   9 ∆K
  10 Þu
  10 kd
  10 kH
  11 Þ∞
  12 ÞS
  13 kA
  15 Þf
  15 ÞT
  19 øm
  22 ka

chunkybanana avatar Feb 17 '22 07:02 chunkybanana

Hey, this feels automatable! :p

gingershaped avatar Jan 06 '23 20:01 gingershaped

@GingerIndustries Apparently, SEDE doesn't exactly have an API

ysthakur avatar Feb 01 '23 05:02 ysthakur

It looks like v and : are extremely common characters. So it would also be helpful to make some elements that would shorten common operations on these items.

Judging from the 2023 results, v is commonly used in vL, so it would be helpful to introduce a 1-byte element that takes a vectorized length. Also, is quite common as well, so you can add a command that sets the register to a value without using up the item on the stack.

; is quite common even though you have flags. So it would be nice to merge certain commands with an end-of-loop structure. ;f, =;, and ;h are the most common among them.

;f (End loop structure and flatten) sounds like a pretty useful thing to me, along with ;h. It would be very nice to make 1-byte shortenings for these, since these are pretty useful. (I'll do an analysis on these structures, to see whether they happen at the end of a program. This will determine whether making flags would be helpful.)

Not sure about =;, do you recall the cases where you had to use =;?

jfioasd avatar Mar 25 '23 05:03 jfioasd