01程序员

Python中的语音识别

fullstacker 发布于 2021-02-01

在下面的教程中，我们将学习更多关于语音识别的知识，并开发一个能够从音频文件中识别语音的简单语音识别应用程序。

但在开始之前，让我们先了解一下语音的含义以及语音处理系统执行的任务。

让我们从定义语言开始。言语是人类最基本的交流方式之一。语音处理的基本思想是提供人与计算机/机器之间的交互。

语音处理系统主要执行三个任务或活动。这些任务是：

首先，语音识别帮助计算机/机器捕捉我们说的单词、短语甚至句子。

其次，自然语言处理帮助计算机/机器理解我们所说的话。

最后，语音合成帮助计算机/机器说话。

现在，让我们转向语音识别的主要概念，即对人类所说的单词、短语和句子的移情过程。Python中有多个库可用于语音识别。但是，我们将使用所有这些库中最简单的一个，即SpeechRecognition库。

语音识别库的安装

为了安装SpeechRecognition库，我们必须在命令shell或终端中执行以下命令：

$ pip install SpeechRecognition

基于音频文件的语音识别

既然我们已经安装了SpeechRecognition库，让我们从将语音从音频文件翻译成文本开始。我们将使用一个音频文件，可以从这个链接下载。请将此音频文件下载到本地文件系统，或者其他人可以使用自己的文件系统。

现在，作为第一步，我们必须导入项目所需的库。我们将导入语音识别库，只有我们已经安装了这个项目早期。

import speech_recognition as speech_rcgn

我们必须使用语音识别模块中可用的识别器类将语音翻译成文本。Recognizer类为开发人员提供了各种方法，具体取决于用于将语音转换为文本的底层API。这些方法如下：

S. No.	Methods	Description
1	recognize_google()	这个方法使用googlespeechapi。
2	recognize_google_cloud()	此方法使用Google云语音API。
3	recognize_bing()	此方法使用Microsoft Bing语音API。
4	recognize_ibm()	此方法利用IBM语音到文本API。
5	recognize_houndify()	此方法使用SoundHound提供的Houndify API。
6	recognize_sphinx()	此方法使用PocketSphinx API。

在上述所有方法中，我们可以使用recognize_sphinx（）方法脱机将语音转换为文本。

我们将创建由语音识别模块提供的AudioFile类的对象，以便从音频文件中识别语音。我们要将语音转换为文本的音频文件路径将传递给AudioFile类构造函数。在下面的脚本中可以看到相同的执行过程：

sample = speech_rcgn.AudioFile('D:/Python/my_audio_f.wav')

在上面的代码片段中，我们可以更新要转录的音频文件路径。

现在，我们将使用recognize\u google（）方法来转录音频文件。但是，recognize\u google（）方法需要一个参数作为语音识别模块中可用的AudioData对象。我们可以使用Recognizer类中提供的record（）方法将音频文件转换为音频数据对象。同样可以通过将AudioFile对象传递给record（）方法来完成。在下面的脚本中可以看到相同的执行过程：

 with sample as audiofile:
     audiocontent = rcgn.record(audiofile)

现在为了检查audiocontent变量的类型，我们必须使用type（）方法，如下所示：

print(type(audiocontent))

输出

<class 'speech_recognition.AudioData'>

检查完audiocontent对象的类型后，我们可以将此对象传递给Recognizer（）类中可用的recognize\u google（）方法，并检查音频样本是否转换为文本。

整个程序的执行情况如下图所示：

例子

import speech_recognition as speech_rcgn
 rec = speech_rcgn.Recognizer()
 sample = speech_rcgn.AudioFile('D:/Python/my_audio_f.wav')
 with sample as audiofile:
     audiocontent = rec.record(audiofile)
 print(type(audiocontent))
 print(rec.recognize_google(audiocontent))

输出

<class 'speech_recognition.AudioData'>
 perhaps this is what is PR agency is are their dignity schedule III was much is 50 feet then the choreographer missed arbitrated never go back into acquiescence with things as they find it in misery and isolation around us in this instance such personal purchase for a luxury cases of severe and advisement say he is a horse days Ranjan or he may have a point that contains between fuel prices straight line which symbolises uniqueness the circuit universality of small hole in wall with client has more subtle implications in passport after expiry marketing program manufacturers taking initiative of the costs involved cricket overlapping twisted widely spaced to you always navigate like this

在上面的示例中，我们导入了speech\u recognition模块，然后实例化了speech\u recognition模块的Recognizer类。我们使用AudioFile方法来指定音频文件的路径。我们使用record（）方法将语音转换为文本。我们还使用了recognizer\u Google（）方法来使用googleapi进行翻译。我们可以观察上面显示的输出；音频已经成功地转换成文本。我们还可以观察到，音频文件没有被100%正确地转录，但准确度是相当合理的。
设置持续时间和偏移值

Python语音识别模块还允许开发人员转录音频文件的特定片段，而不是转录整个语音。例如，假设我们只想转录音频样本的前15秒。在这种情况下，我们需要传递15作为record（）方法中名为duration的参数的值。

现在让我们看一下下面的示例：

例子

 import speech_recognition as speech_rcgn
 rec = speech_rcgn.Recognizer()
 sample = speech_rcgn.AudioFile('D:/Python/my_audio_f.wav')
 with sample as audiofile:
     audiocontent = rec.record(audiofile, duration = 15)
 print(rec.recognize_google(audiocontent))

输出

<class 'speech_recognition.AudioData'>
 perhaps this is what is PR agency is are their dignity schedule III was much is 50 feet then the choreographer missed arbitrated never go back into acquiescence with things as they find it in misery and isolation around us in this instance such personal purchase for a luxury cases of severe and advisement say he is a horse days Ranjan or he may have a point that contains between fuel prices straight line which symbolises uniqueness the circuit universality of small hole in wall with client has more subtle implications in passport after expiry marketing program manufacturers taking initiative of the costs involved cricket overlapping twisted widely spaced to you always navigate like this

在上面的示例中，我们在record（）方法中包含了值为15的duration参数。因此，作为一个输出，我们得到的音频样本转录长达15秒。

类似地，我们可以借助offset参数从一开始就跳过音频文件的某些部分。offset参数是record（）方法的另一个属性，有助于从一开始裁剪音频文件。例如，如果我们不想转录音频的前5秒，我们必须传递5作为偏移属性值。因此，将跳过前5秒的文本转换，并转录音频文件的其余部分。

让我们考虑以下同样的例子：

 import speech_recognition as speech_rcgn
 rec = speech_rcgn.Recognizer()
 sample = speech_rcgn.AudioFile('D:/Python/my_audio_f.wav')
 with sample as audiofile:
     audiocontent = rec.record(audiofile, offset = 5, duration = 15)
 print(rec.recognize_google(audiocontent))

输出

10 matches 50 feet in a choreographer missed arbitrated never settle back into acquiescence with things as they work finds it in an industry and isolation Raunak in this instance such personal purchase for a luxury

在上面的示例中，我们在record（）方法中包含了值为5的offset参数。因此，作为一个输出，我们得到的转录跳过了音频文件的前5秒，然后转录了15秒的音频。

如何操作声音

由于各种原因，音频文件可能包含噪声。这种噪声会影响整个转录质量。为了降低噪声，Recognizer类提供了名为adjust\u for \u ambient\u Noise（）的方法，该方法将AudioData对象作为参数。

下面的例子演示了通过去除音频中的噪声来提高转录质量：

例子

import speech_recognition as speech_rcgn
 rec = speech_rcgn.Recognizer()
 sample = speech_rcgn.AudioFile('D:/Python/my_audio_f.wav')
 with sample as audiofile:
     rec.adjust_for_ambient_noise(audiofile)
     audiocontent = rec.record(audiofile)
 print(rec.recognize_google(audiocontent))

输出

Kesariya pareshani is are their dignity have you thought it was much is 50 feet then the choreographer missed arbitrated never go back into acquiescence with things as they work finds it in misery and isolation around us in this instance such personal purchase for a luxury cases of severe and advisement say he is a horse days Ranjan or he may have a point that contains between fuel prices straight line which symbolises uniqueness the circuit universality of small hole in wall with client has more subtle implications in passport after expiry marketing program manufacturers take an initiative of the costs involved cricket overlapping twisted widely spaced to you always navigate like this

结论

在上面的示例中，我们包含了adjust_for_ambient_noise（）方法，以减少音频文件中的噪声。因此，脚本处理了音频文件中的噪声，并将语音转录成文本并打印出来。此外，输出可以与我们在第一个示例中得到的非常相似，因为音频文件已经具有较少的噪声。

语音识别在人机交互领域有着广泛的应用，最重要的是语音自动转录。使用PyAudio、SpeechRecognition和Google Speech API等多种库和方法，我们可以轻松构建Alexa、Siri和Google Assistant等语音到文本应用程序，它不仅可以转录音频文件，还可以帮助从麦克风现场转录。此外，我们还可以使用像adjust \u for \u ambient \u noise（）这样的方法来处理音频文件中的噪声。

全栈者

关注私信

文章

关注

粉丝