Enable transcription by default in some circumstances
This issue is more a suggestion than an issue.
The transcription is not enable by default. But in some circumstances it could be enable or disabled. I see two cases.
- The transcription could be enabled by default when processing audio files attached to chats
- The transcription could be ignored by default when the audio file is known in hashes database.
These behaviors could be change using some property like
# Values: all, unknown, chats, none
itemsToTranscript = chats
Or using separated properties
enableTranscriptionOnlyForChats = true
disableTranscriptionForKnownFiles = false
Just ideas....
Good idea. Maybe the transcription can be executed on audio files that returns as a result of a configurable query. So the configuration parameter could contain this query.
Em qua., 22 de jun. de 2022 22:57, André Berenguel @.***> escreveu:
This issue is more a suggestion than an issue.
The transcription is not enable by default. But in some circumstances it could be enable or disabled. I see two cases.
- The transcription could be enabled by default when processing audio files attached to chats
- The transcription could be ignored by default when the audio file is known in hashes database.
These behaviors could be change using some property like
Values: all, unknown, chats, none
itemsToTranscript = chats
Or using separated properties
enableTranscriptionOnlyForChats = true disableTranscriptionForKnownFiles = false
Just ideas....
— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/1183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247SY42L53DLHK2NGSBULVQPG73ANCNFSM5ZSWHQ2Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @aberenguel.
- The transcription could be enabled by default when processing audio files attached to chats
This was already proposed informally by other users, thanks for opening this discussion. The chalenge is, when an audio file is processed, we don't know if it is attached to some chat. Just when the chat is processed in a 2nd or 3rd processing queue, we discover attached audios, but then they were already processed/indexed. We may re-process them at this point (thanks to #1062, updating indexed content), maybe using a thread pool like in the new download WhatsApp attachtments feature to don't run single threaded, this would need some interface to update items in the index (it does not exist right now, but I'm planning to add one, I'll open a separate issue) but processing pipeline may become a bit more complicated, I'm not sure how to call transcription module from parsing module right now (eventually from a separate parsing process, but this doesn't happen today for items being processed in queues > 0, like chats), this idea should be enhanced... Maybe the desired #24 could help, running different processing pipelines one after the other, but that will require a resonable effort...
2. The transcription could be ignored by default when the audio file is known in hashes database.
This is easy to be implemented and is already done for photoDNA.
Maybe the transcription can be executed on audio files that returns as a result of a configurable query.
This could be part of the feature configuration, but the chalenges I pointed at first still stand. I'm open for ideas...
A simple workaround to recorded audios of some chat apps, like WhatsApp, that I already suggested to 2 users in the past, is to define new audio mimetypes, children of the original ones (opus, etc) , based on filename patterns and change AudioTranscripConfig.txt to run just on those new mime types, that should work for some cases.
Would above be enough? What recorded audio name patterns and extensions are more common out there?
Would above be enough? What recorded audio name patterns and extensions are more common out there?
I think if we can choose file name patterns and mime types for transcription it would be ok. Combined with some disableTranscriptionForKnownFiles property would be even better..
As you mentioned, process files attached to chats would be complicated and a little bit risky to implemented in the next 4.0.0.release.
So, what audio file extensions/mimetypes are more used by chat apps? opus, aac, flac?
IPED could have a flag/bookmark so you could generate your bookmarks.iped and set flags in files you want to transcript, then run it to generate your report. Or just flag files you want to transcript and reprocess only those files (and update chats). I dont know if IPED could do it incremental.
IPED could have a flag/bookmark so you could generate your bookmarks.iped and set flags in files you want to transcript, then run it to generate your report.
You might enable transcription before generating the report, so it will run just on bookmarked audios sent to report. Chats won't be updated, since this use case was not the original goal of the feature. There are some challenges I already described in #696. If you could help, contributions are very welcome.
Or just flag files you want to transcript and reprocess only those files (and update chats). I dont know if IPED could do it incremental.
This depends on the non trivial #24.
What recorded audio name patterns and extensions are more common out there?
Any suggestions about this?
In cases where there are a lot of MP3 music file, I would like to avoid audio/mpeg mimetypes.
So I tried some mimetypes to comprehend the voice file sent in chat apps:
mimesToProcess = audio/3gpp; audio/aac; audio/aiff; audio/amr; audio/mp4; audio/ogg; audio/qcelp; audio/wav; audio/webm; audio/x-caf; audio/x-ms-wma; audio/x-opus+ogg
In cases where there are a lot of MP3 music file, I would like to avoid
audio/mpegmimetypes. So I tried some mimetypes to comprehend the voice file sent in chat apps:mimesToProcess = audio/3gpp; audio/aac; audio/aiff; audio/amr; audio/mp4; audio/ogg; audio/qcelp; audio/wav; audio/webm; audio/x-caf; audio/x-ms-wma; audio/x-opus+ogg
Great! I'll use your list then, thank you!
Reopening, some important audio mimes are being missed, a wrong mime was configured or it is an alias, we should use the normalized version.