RFP : Voice to Text in Maryland, United States

The Social Security Administration (SSA) is conducting market research/sources sought to help determine the availability and technical capability of qualified businesses providing a solution that can produce an indexed and tagged transcript of audio files. This is not a request for quotations or proposals, and we do not guarantee the issuance of a solicitation as a result of this notice. We will not pay for any costs incurred in the preparation of responses to this market survey. We will not pay for the Government's use of the information you provide. If you provide any proprietary information, please clearly identify as such.

Background
The Office of Hearings Operations (OHO) is responsible for holding hearings as part of the agency's administrative appeals process. Currently, all hearings must be recorded digitally because a certified record of the hearing is required if a claimant files a civil action in the United States District Court. Once recorded, the audio of the hearing becomes part of SSA's official claim file. Hearings are recorded using SSA's Digital Recording and Processing system (DRAP).

The following are a list of individuals that may participate in a hearing:
• Administrative Law Judge (ALJ) - SSA employee conducting the hearing.
• Claimant - Member of the public requesting disability benefits.
• Representative - Attorney representing the claimant.
• Vocational Expert - A paid contractor who attends the hearing to speak about the job market.
• Medical Expert - A paid contractor who attends the hearing to speak about the claimant's medical history and ability to work. There can be multiple medical experts at a hearing.
• Verbatim Hearing Reporter (VHR) - A paid contractor who controls the DRAP equipment and takes notes during the hearing.
• Claimant Witness - A relative or friend of the claimant who attends to speak about the claimant's allegations.
• Interpreter - A paid contractor who translates for a non-English speaking claimant, or a sign language interpreter for a claimant with a hearing impairment.

Each individual may participate in a hearing via video, telephone, or in person at the hearing site. DRAP currently captures testimony from all participants regardless of audio source. Audio is recorded in an Ogg Vorbis format. DRAP software resides on individual computers within the hearing rooms. The audio is recorded locally, copied to a mid-tier server, and then copied to a central storage location within the agency's UNIX mainframe direct-access storage device (DASD). Audio at local and mid-tier locations are not permanent, and are deleted within an interval no less than 6 months.
The current DRAP system supports approximately 1,500 hearing sites producing on average 2,600 recordings per business day.

What We Need
During recording, a VHR types notes within DRAP. These notes are later used for reference by Decision Writers, ALJs, and other hearing office employees. We envision an automated process that will produce a text transcript of either the live audio stream, or previously recorded audio files. The solution must allow easy navigation of the tagged portions of the audio recording. The index/tag/transcript information must be either XML or JSON. If the solution creates audio files, they must be in an open format (i.e. non-proprietary).

As part of the AI capability we are looking for the solution to process live audio streams or previously recorded audio files without requiring "voice training" by hearing participants.

In your response to this sources sought notice, please provide the following information:
1. Describe your potential solution. Include information regarding accuracy for different regional dialects. Provide detailed information regarding the files generated by your solution (e.g. file type, size, tagging structure).
2. Is your solution a commercial-off-the-shelf product?
3. What should we be thinking about to give us maximum value/benefit for what we are trying to achieve? What should we not be focusing on?
4. What do you think a minimum viable product (MVP) would look like?
5. Have you successfully implemented your solution? If so, please provide details on the number of installations, size of the installations, etc. Indicate if the implementation is currently in production or in a testing environment.
6. Can you provide a demo of your potential solution?
7. Do you have any small business designations? Please list if yes.
8. Do you have this solution on a GSA Schedule or other GWAC?
9. Have you provided this solution in an FIPS-199 Moderate or higher environment?
10. Does the solution retain any portion of the original record, either in audio or transcribed format outside of the delivered transcript? (i.e., is data retained for training, analysis, or other uses outside of the primary use?)
11. If the solution is delivered in a cloud environment (SaaS or other model) is the solution FedRamp Authorized? If so, how many authorizations currently exist? Does the solution use encryption at rest and in transit? What computing resources, storage resources, and network resources should we be planning for this service/item?
12. What sort of response time should we expect from this type of solution?
13. Additional Information/Questions

Katherine B. Medeiros, Phone 4109651067, Email katherine.medeiros@ssa.gov