emuR
emuR copied to clipboard
Modifying session variable breaks get_trackdata()
I imagine there is some good reason for this, but I don't understand this behaviour:
> create_emuRdemoData(dir = tempdir())
> demo_data_dir = file.path(tempdir(), "emuR_demoData")
> tg_col_dir = file.path(demo_data_dir, "TextGrid_collection")
> path2directory = file.path(tempdir(), "my-first_emuDB")
> convert_TextGridCollection(dir = tg_col_dir,
dbName = "my-first",
targetDir = tempdir(),
tierNames = c("Word", "Syllable", "Phoneme", "Phonetic"))
> db_handle = load_emuDB(path2directory, verbose = FALSE)
Now create a seglist:
> sl_vowels = query(db_handle, "Phonetic == @")
> sl_vowels
> sl_vowels
# A tibble: 28 × 16
labels start end db_uuid session bundle start_item_id end_item_id level attribute start_item_seq_… end_item_seq_idx type sample_start sample_end
<chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr> <chr> <int> <int> <chr> <int> <int>
1 @ 1506. 1548. af022ed… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124 30968
2 @ 1715. 1741. af022ed… 0000 msajc… 107 107 Phon… Phonetic 22 22 SEGM… 34309 34828
3 @ 1967. 2034. af022ed… 0000 msajc… 112 112 Phon… Phonetic 27 27 SEGM… 39334 40673
4 @ 2303. 2362. af022ed… 0000 msajc… 117 117 Phon… Phonetic 32 32 SEGM… 46059 47238
5 @ 2447. 2506. af022ed… 0000 msajc… 119 119 Phon… Phonetic 34 34 SEGM… 48949 50125
6 @ 1917. 1958. af022ed… 0000 msajc… 118 118 Phon… Phonetic 26 26 SEGM… 38340 39155
7 @ 2022. 2078. af022ed… 0000 msajc… 120 120 Phon… Phonetic 28 28 SEGM… 40439 41569
8 @ 2382. 2431. af022ed… 0000 msajc… 126 126 Phon… Phonetic 34 34 SEGM… 47650 48619
9 @ 330. 380. af022ed… 0000 msajc… 91 91 Phon… Phonetic 3 3 SEGM… 6609 7590
10 @ 1472. 1490. af022ed… 0000 msajc… 108 108 Phon… Phonetic 20 20 SEGM… 29441 29808
# … with 18 more rows, and 1 more variable: sample_rate <int>
Get trackdata:
> td_vowels = get_trackdata(db_handle,
seglist = sl_vowels,
onTheFlyFunctionName = "forest",
verbose = F)
> td_vowels
# A tibble: 287 × 24
sl_rowIdx labels start end db_uuid session bundle start_item_id end_item_id level attribute start_item_seq_… end_item_seq_idx type sample_start
<int> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr> <chr> <int> <int> <chr> <int>
1 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
2 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
3 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
4 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
5 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
6 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
7 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
8 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
9 1 @ 1506. 1548. af022edb… 0000 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124
10 2 @ 1715. 1741. af022edb… 0000 msajc… 107 107 Phon… Phonetic 22 22 SEGM… 34309
# … with 277 more rows, and 9 more variables: sample_end <int>, sample_rate <int>, times_orig <dbl>, times_rel <dbl>, times_norm <dbl>, T1 <int>,
# T2 <int>, T3 <int>, T4 <int>
So far, so good. Now I want session
to have some other value:
sl_vowels$session<-recode(sl_vowels$session, `0000`="F1")
# A tibble: 28 × 16
labels start end db_uuid session bundle start_item_id end_item_id level attribute start_item_seq_… end_item_seq_idx type sample_start sample_end
<chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <chr> <chr> <int> <int> <chr> <int> <int>
1 @ 1506. 1548. af022ed… F1 msajc… 103 103 Phon… Phonetic 18 18 SEGM… 30124 30968
2 @ 1715. 1741. af022ed… F1 msajc… 107 107 Phon… Phonetic 22 22 SEGM… 34309 34828
3 @ 1967. 2034. af022ed… F1 msajc… 112 112 Phon… Phonetic 27 27 SEGM… 39334 40673
4 @ 2303. 2362. af022ed… F1 msajc… 117 117 Phon… Phonetic 32 32 SEGM… 46059 47238
5 @ 2447. 2506. af022ed… F1 msajc… 119 119 Phon… Phonetic 34 34 SEGM… 48949 50125
6 @ 1917. 1958. af022ed… F1 msajc… 118 118 Phon… Phonetic 26 26 SEGM… 38340 39155
7 @ 2022. 2078. af022ed… F1 msajc… 120 120 Phon… Phonetic 28 28 SEGM… 40439 41569
8 @ 2382. 2431. af022ed… F1 msajc… 126 126 Phon… Phonetic 34 34 SEGM… 47650 48619
9 @ 330. 380. af022ed… F1 msajc… 91 91 Phon… Phonetic 3 3 SEGM… 6609 7590
10 @ 1472. 1490. af022ed… F1 msajc… 108 108 Phon… Phonetic 20 20 SEGM… 29441 29808
# … with 18 more rows, and 1 more variable: sample_rate <int>
Note that session
is still a <chr>
. But now:
> td_vowels = get_trackdata(db_handle,
+ seglist = sl_vowels,
+ onTheFlyFunctionName = "forest",
+ verbose = F)
Error in get_trackdata(db_handle, seglist = sl_vowels, onTheFlyFunctionName = "forest", :
Following utts entry not found:
In addition: Warning messages:
1: In get_trackdata(db_handle, seglist = sl_vowels, onTheFlyFunctionName = "forest", :
The emusegs/emuRsegs object passed in refers to bundles with in-homogeneous sampling rates in their audio files! Here is a list of all refered to bundles incl. their sampling rate:
[1] session bundle media_file sample_rate md5_annot_json
<0 rows> (or 0-length row.names)
2: Unknown or uninitialised column: `utts`.
I am willing to accept that part of the solution here is "don't mess with session
, but at the very least, this seems like the wrong error message: nothing has touched the audio files or their sample rates.
I would consider database query results (or at least their session and bundle names) as immutable objects. If you change the session name in the query result, the database will not be updated automagically (by renaming the session). The results are just tibble objects. With the second call to get_trackdata
you query bundles of a non-existent session 'F1' and that fails.
I agree that the error message is misleading.
Do you just want to rename the session from '0000' to 'F1' ?
We were just using session
as a stand-in for "speaker"; it's simpler to just create a new column. But yes, the idea was just to rename the session(s).