Tinkering with Chrome Headless to Handle Mic Input
This pull request consisted of performing more research for the CodeTalker repository. One of the ongoing issues with the repository is figuring out how to make it easy to capture microphone audio across a variety of operating systems, and then process it in a way that is easy to set up across multiple user's devices (all of which have may use different operating systems).
In an earlier attempt to make the development environment more accessible to potential contributors to the CodeTalker repo, I tried to mount all of the scripts that processed microphone audio to a Docker container. The advantage of this was that the Dockerfile, which defines how a docker image (virtual machine) is built can be configured to install and configure various dependencies. These dependencies in this case would be for processing microphone audio (Python libraries, and various microphone hardware handling libraries). This worked great for Unix style machines, because the microphone hardware (and other hardware) is mounted directly to the operating system's directory system. Since you can mount directories to Docker containers, the microphone hardware can be made accessible to the Docker container simply by mounting the directory for the microphone.
However, there are no such provisions for mounting microphone audio from other types of operating systems (especially Windows) that don't mount hardware to their directory structure.
A possible solution for this would be to use a program that is already available across a variety of platforms to capture microphone audio, then pass it to the Docker container. Headless web browsers like Chrome Headless are available across a variety of platforms, and newer versions of Chrome come with chrome headless by default. Since Docker containers can be configured to run web servers that the host machine can communicate with, a logical (if not overly complicated) step to take towards compatibility would be to capture microphone audio with a html page hosted by the Docker container, and accessed by a chrome headless browser. Once the Docker container received the audio, it could use its existing scripts to process it, and return its text (from the speech to text conversion) to stdout as it currently does with the current implementation of the Docker container (see CodeTalker's 'Dockerize' branch.
While I managed to do some research and write a simple webpage that would capture microphone audio and play it back through the user's speakers using the GUI version of Chrome, I ran into a problem when trying to get the Chrome headless browser to run the page. Since Chrome does not allow webpages to access the microphone without permission by default, and there is no GUI to click a pop-up to allow Chrome Headless to give access to the microphone. One hacky way to try to give permissions was to edit Chrome's Preferences file to allow microphone access to specific website URLS (or so I thought). When trying to edit the Preferences file to allow microphone access to a little test webpage I made to capture microphone audio, I found that on launching Chrome, it would consistently say I had corrupted the Preferences file. Unfortunately, I have not found another way to manually set microphone permissions. You can see which applications are allowed access in Chrome's advanced settings in the GUI, but it doesn't seem like you can manually add new URLS or files.
This is a problem to tinker with over the holidays. At the very least, I created some documentation to illustrate how to set up a basic microphone testing webpage.
You can see the full details of my misadventures in the chromeHeadlessMicInput.md file here
In an earlier attempt to make the development environment more accessible to potential contributors to the CodeTalker repo, I tried to mount all of the scripts that processed microphone audio to a Docker container. The advantage of this was that the Dockerfile, which defines how a docker image (virtual machine) is built can be configured to install and configure various dependencies. These dependencies in this case would be for processing microphone audio (Python libraries, and various microphone hardware handling libraries). This worked great for Unix style machines, because the microphone hardware (and other hardware) is mounted directly to the operating system's directory system. Since you can mount directories to Docker containers, the microphone hardware can be made accessible to the Docker container simply by mounting the directory for the microphone.
However, there are no such provisions for mounting microphone audio from other types of operating systems (especially Windows) that don't mount hardware to their directory structure.
A possible solution for this would be to use a program that is already available across a variety of platforms to capture microphone audio, then pass it to the Docker container. Headless web browsers like Chrome Headless are available across a variety of platforms, and newer versions of Chrome come with chrome headless by default. Since Docker containers can be configured to run web servers that the host machine can communicate with, a logical (if not overly complicated) step to take towards compatibility would be to capture microphone audio with a html page hosted by the Docker container, and accessed by a chrome headless browser. Once the Docker container received the audio, it could use its existing scripts to process it, and return its text (from the speech to text conversion) to stdout as it currently does with the current implementation of the Docker container (see CodeTalker's 'Dockerize' branch.
While I managed to do some research and write a simple webpage that would capture microphone audio and play it back through the user's speakers using the GUI version of Chrome, I ran into a problem when trying to get the Chrome headless browser to run the page. Since Chrome does not allow webpages to access the microphone without permission by default, and there is no GUI to click a pop-up to allow Chrome Headless to give access to the microphone. One hacky way to try to give permissions was to edit Chrome's Preferences file to allow microphone access to specific website URLS (or so I thought). When trying to edit the Preferences file to allow microphone access to a little test webpage I made to capture microphone audio, I found that on launching Chrome, it would consistently say I had corrupted the Preferences file. Unfortunately, I have not found another way to manually set microphone permissions. You can see which applications are allowed access in Chrome's advanced settings in the GUI, but it doesn't seem like you can manually add new URLS or files.
This is a problem to tinker with over the holidays. At the very least, I created some documentation to illustrate how to set up a basic microphone testing webpage.
You can see the full details of my misadventures in the chromeHeadlessMicInput.md file here
Comments
Post a Comment