julius
julius copied to clipboard
Is it possible to configure Julius in a way such that only phoneme recognition happens without tokenizing into words?
I'm working on an application and need to simply recognize a stream of sounds, even if they are not tokenized into words. I expect to get a list of all recognized phonemes. Is it possible with Julius?
I think you will need the output from Julius running in server mode, that is, using the -module
option. See this help page in particular -module
and -outcode
(phone sequence).
Yes, you can do it, because I did it with HTK and Julius. Try to picture a word dictionary wich contains one phoneme for one word, like vowels.
@Estalhun sounds promising. thanks. can you expand just a little bit?
Something like this:
Sample "sententences" for training (Hungarian):
<s> T A N UU S II T V AA NY O K N A K T A R G O N<4> C A K E Z E L OEE !BREATH T A R T A L O M T AA R A KK A L !BREATH T A K SZ I S O F OEE R T E H N O L OO G I J<4> A T E H N O L OO G I A J<4> I !BREATH T E L E P H E JJ E L </s>
<s> T E R V E Z OEE A SZ T A L T OO L !BREATH T U L A J D O N UU T AA FF E L UE GY E L E T T EE R II T EE S M E N<4> T E S !BREATH T II Z M I LL I OO J<4> I T !BREATH T OE B R OEE L </s>
<s> T OE R T EE N EE S E J<4> I N E K !BREATH T UU L M E N OEE E N !BREATH V A D<2> NY U G A T O N !BREATH V A D AA SZ A TT OO L !BREATH V E H E SS E N E K !BREATH V E R S E NY<1> B E N !BREATH V E V OEE K I G !BREATH V I LL A M O S M UEE V E K V I L AA G M EE R E T UEE !BREATH V AA L A SZ I D E J E !BREATH V AA L A SZ I D OEE T </s>
<s> !BREATH V AA LL A L K O Z OO J<4> I V EE TT EE K H U SZ O N E GGY E D I K !BREATH Z AA SZ L OO S H A J OO J A K EE N<4> T AA T J AA R J A AA TT OE R EE S EE L E T UU T !BREATH EE R D E KK OE R B E OE K O SZ I SZ T EE M A OE K O SZ I SZ T EE M AA B A N OE N<3> M A G AA N A K !BREATH OE N<4> T OE R V EE NY UEE !BREATH </s>
<s> UE TY F EE L SZ O L G AA L A T O S !BREATH UE GY I N<4> T EE Z EE SS E L UE T E MM E L </s>
Dictionary:
!ANIMALS [!ANIMALS] !ANIMALS
!BREATH [!BREATH] !BREATH
!COUGH [!COUGH] !COUGH
!HNOISE [!HNOISE] !HNOISE
!MUSIC [!MUSIC] !MUSIC
!NOISE [!NOISE] !NOISE
!OOO [!OOO] !OOO
!PHONE [!PHONE] !PHONE
</s> [] sil
<s> [] sil
SENT-END [] sil
SENT-START [] sil
A [A] A
AA [AA] AA
E [E] E
EE [EE] EE
I [I] I
II [II] II
O [O] O
OO [OO] OO
OE [OE] OE
OEE [OEE] OEE
U [U] U
UU [UU] UU
UE [UE] UE
UEE [UEE] UEE
B [B] B
BB [BB] BB
P [P] P
PP [PP] PP
D [D] D
DD [DD] DD
D<2> [D<2>] D<2>
D<1> [D<1>] D<1>
T [T] T
TT [TT] TT
GY<1> [GY<1>] GY<1>
GY [GY] GY
GGY [GGY] GGY
T<2> [T<2>] T<2>
TY [TY] TY
TTY [TTY] TTY
G [G] G
GG [GG] GG
T<4> [T<4>] T<4>
K [K] K
KK [KK] KK
V [V] V
VV [VV] VV
F [F] F
FF [FF] FF
Z [Z] Z
ZZ [ZZ] ZZ
SZ [SZ] SZ
SSZ [SSZ] SSZ
SZS [SZS] SZS
ZS [ZS] ZS
ZZS [ZZS] ZZS
S [S] S
SS [SS] SS
H [H] H
HH [HH] HH
H<1> [H<1>] H<1>
J<1> [J<1>] J<1>
J<3> [J<3>] J<3>
J<4> [J<4>] J<4>
DZ [DZ] DZ
DDZ [DDZ] DDZ
C [C] C
CC [CC] CC
C<1> [C<1>] C<1>
DZS [DZS] DZS
DDZS [DDZS] DDZS
T<3> [T<3>] T<3>
CS [CS] CS
CCS [CCS] CCS
L [L] L
LL [LL] LL
L<1> [L<1>] L<1>
J [J] J
JJ [JJ] JJ
R [R] R
RR [RR] RR
R<3> [R<3>] R<3>
RR<3> [RR<3>] RR<3>
M [M] M
MM [MM] MM
M<4> [M<4>] M<4>
M<5> [M<5>] M<5>
N<3> [N<3>] N<3>
N [N] N
NN [NN] NN
N<4> [N<4>] N<4>
NY<1> [NY<1>] NY<1>
NY<3> [NY<3>] NY<3>
N<5> [N<5>] N<5>
NY [NY] NY
NNY [NNY] NNY
' But phoneme based (monophones) recognition - without triphones and LM - is absolutely worst ever.