ABOUT
Voice-Controlled Smart Home Prototype Using AI-Driven Speech and IoT Automation
The report documents the work completed during the Winter Robotics & AI Internship: a voice-controlled smart home prototype that combines an AI-style speech interface with an Arduino-based Internet of Things (IoT) automation system. The project was built as a safe, low-voltage educational prototype that demonstrates the complete pipeline from spoken input to physical actuation.
The prototype listens for a wake word ("assistant"), converts speech to text using the Python SpeechRecognition library (Google Web Speech API backend), interprets the user’s intent using a simple rule-based natural language processing (NLP) layer, and then sends compact ASCII commands over Bluetooth (HC-05) to an Arduino Uno. The Arduino controls multiple actuators (two lights, a DC motor fan, a servo door, and an alarm buzzer + LED) and provides real-time feedback on a 16×2 I2C LCD display.
TEAM
Shameer Kashif Khan
Project Designer, Developer & Builder
Shameer independently designed, coded, assembled, and wired the entire project.
His all-round technical skills and dedication were key to the project’s successful completion.
INTRODUCTION
Smart home systems are designed to make everyday tasks easier by automating household devices and enabling remote or hands-free control. Typical examples include controlling lights, fans, door locks, security alarms, and other appliances through mobile apps or voice assistants. While commercial platforms such as Alexa and Google Home provide polished user experiences, they often hide the internal mechanics and require proprietary hardware ecosystems.
For learning and prototyping, it is valuable to create an end-to-end system that is transparent: the developer can clearly see how audio becomes text, how text becomes a command, how the command is transmitted to an embedded controller, and how the controller drives physical outputs. This internship project focuses on building that transparent pipeline using widely available tools: Python on a PC, Bluetooth serial communication, and an Arduino Uno.
The final prototype behaves like a simplified voice assistant: the user speaks a wake word followed by a command, for example: "assistant, turn on the first light" or "assistant, open the door". The system translates this speech into actions on the physical model, and the LCD display acts as a local dashboard showing the current device states.
PROBLEM STATEMENT
Manual control of home devices using wall switches, individual remotes, or separate control apps can be inconvenient, especially when users want hands-free interaction or need to control multiple devices quickly. Commercial voice assistants solve this problem, but they are not ideal for learning because most of the important components (speech recognition models, intent parsing, device protocols, and device control logic) are hidden behind closed systems.
The internship challenge was to design an educational smart home prototype that demonstrates the same fundamental idea - voice-driven control - while remaining safe and affordable. The system had to accept spoken commands, interpret intent with reliable rules, coordinate multiple devices, and provide clear feedback to the user.
OBJECTIVES
The project objectives were defined to ensure a complete, demonstrable pipeline and to keep the prototype practical for a student-level internship build:
● Create a voice assistant in Python capable of capturing microphone input and converting speech to text using a speech recognition backend.
● Implement a wake word mechanism to prevent accidental triggers and to mimic commercial assistants.
● Design a lightweight, rule-based NLP layer that can map common phrases to device actions (ON/OFF) and specific targets (light 1, light 2, fan, door, scenes).
● Maintain an internal device-state model so the software always knows the current status of each actuator.
● Implement wireless command delivery to the Arduino using HC-05 Bluetooth serial communication. ● Develop Arduino firmware that reads line-based commands, controls actuators, and updates an I2C LCD status display.
● Integrate and validate the full workflow:
● voice → text → intent → device command → physical action → LCD feedback
EXISTING SYSTEMS
Existing smart home ecosystems generally fall into three categories: (1) cloud-centric voice assistant platforms (Alexa, Google Home), (2) hub-based local automation (Home Assistant, SmartThings), and (3) direct device control through smartphone apps (manufacturer-specific apps). These systems typically include device discovery, user authentication, encrypted communication, and advanced natural language understanding.
In comparison, this internship prototype intentionally simplifies many of these features to focus on the learning value and the demonstration of core concepts:
● Speech recognition is handled through a ready-made service via Python’s SpeechRecognition library rather than training a model.
● Intent recognition is rule-based (keyword/phrase matching) rather than a statistical or transformer-based NLU system.
● Bluetooth serial is used for quick, low-cost wireless communication rather than Wi-Fi + MQTT or Zigbee/Z-Wave.
● A local LCD provides immediate feedback rather than a mobile app dashboard. This simplification is not a weakness for an internship prototype; it makes the full pipeline visible and modifiable. The design also keeps the system safe by avoiding mains electricity and using low-voltage components.
SYSTEM OVERVIEW
The system is organized into four layers. Each layer has a clear responsibility, making the system easier to develop, debug, and extend.
Layered architecture
● Voice Capture and Speech Recognition (PC/Python): records audio from a microphone and produces a text transcript.
● Wake Word + Command Extraction (PC/Python): checks for the wake word "assistant" and extracts the actual command phrase.
● Command Interpretation + Automation (PC/Python): maps commands to actions and scenes, and updates internal device states.
● Hardware Control + Feedback (Arduino): receives compact commands, drives actuators, and updates the LCD display.
At runtime, the system behaves like a conversation loop: it listens, transcribes, filters by wake word, interprets the command, and sends a device-level instruction to the Arduino. The Arduino executes the instruction and updates the LCD so users can confirm the current system status even without looking at the PC console.
System architecture diagram
Host PC (Python) Microphone Input Speech Recognition (Google) Wake Word Filter Rule-based NLP + Scenes Device State Dictionary Serial sender (pyserial) | Bluetooth Serial HC-05 | Arduino Uno Command Parser (line-based) Actuators: - Light1 LED - Light2 LED - DC Motor (Fan) via L298N - Servo (Door) - Buzzer + Alarm LED - LCD 16x2 I2C Status |
MECHANICAL DESIGN
The mechanical design in this project refers to the physical prototype structure and how the components are arranged to simulate a household environment. The prototype was intentionally built with low-voltage electronics to maintain safety while still representing real smart home behaviors.
A small enclosure (optionally a 3D-printed house model) can be used to mount the Arduino, LCD, and actuators. The layout should keep wiring tidy, make the LCD visible, and allow the door servo and fan motor to move freely without interference. When designing the enclosure, the following practical factors were considered:
● Accessibility: components should be easy to reach for debugging and re-wiring.
● Visibility: lights and LCD must be clearly visible to demonstrate state changes.
● Stability: the motor and servo can create vibration; mounting points should be secure.
● Cable routing: wires should be organized to avoid disconnections and short circuits.
● Power separation: the DC motor supply should be separate from the Arduino’s logic supply, while sharing a common ground.
Although the build does not control mains appliances directly, the same mechanical principles apply to real systems: safe mounting of electronics, wire management, and separating high-power and low-power domains.
Initial Design
Final Design
SOFTWARE AND PROGRAMMING
The software stack is split into two programs: a Python application running on the host PC and an Arduino firmware sketch running on the microcontroller. This separation keeps the AI-style functionality (speech recognition and command interpretation) on a device with more computing power, while the Arduino focuses on real-time hardware control.
Python voice assistant (PC side)
The Python application performs six major responsibilities: (1) initialize microphone calibration to improve recognition reliability, (2) listen for speech input, (3) convert audio to text, (4) detect the wake word and extract the command, (5) interpret the command (including scenes), and (6) send device-level commands to Arduino over Bluetooth serial.
Speech is recognized using the SpeechRecognition library. In this build, recognize_google() is used, which sends audio to Google’s Web Speech API and returns a transcript. The transcript is then normalized (lowercasing, trimming whitespace) and filtered by the wake word. The design choice to include a wake word reduces accidental triggers and allows the system to ignore background conversation.
Command interpretation is rule-based for clarity and predictability. The program uses phrase lists (for example, "turn on" and "turn off") to infer actions, and keyword patterns (for example, "first light" or
"light 1") to identify target devices. In addition to direct device commands, the program supports scene commands such as "good night" (turn off lights, close door, fan on, alarm off) and "i am leaving" (turn off devices, close door, alarm off).
Python Code: Voice Assistant (voice_smart_home.py)
Arduino firmware (microcontroller side)
The Arduino sketch implements a simple command protocol over Bluetooth. The Python side sends newline-terminated ASCII commands (for example, LIGHT1_ON). The Arduino reads incoming characters from SoftwareSerial connected to the HC-05 module, buffers the characters until a newline is received, and then runs a command handler that updates output pins and refreshes the LCD.
State tracking is maintained using boolean variables such as light1On, fanOn, and doorOpen. The LCD display uses these variables to print a compact snapshot of the system status, providing immediate feedback that is independent of the PC console output.
Arduino Sketch Code:
SYSTEM INTEGRATION
System integration focused on making the Python and Arduino components communicate reliably, ensuring that each spoken command results in the correct physical action and an accurate LCD status update.
Communication protocol
Commands are transmitted as human-readable ASCII tokens, each ending with a newline ("\n"). This design makes debugging easier because commands can be logged and tested with serial monitors. The protocol used in this project includes:
● LIGHT1_ON, LIGHT1_OFF
● LIGHT2_ON, LIGHT2_OFF
● MOTOR_ON, MOTOR_OFF
● DOOR_OPEN, DOOR_CLOSE
● ALARM_ON, ALARM_OFF
The Arduino firmware treats each newline as the end of a command. This approach avoids partial reads and allows multiple commands to be sent in sequence. In scene commands that send multiple device actions quickly (for example, "i am leaving"), the Python code includes a short delay between commands to reduce the chance of Bluetooth buffer overload or dropped messages.
Hardware pin mapping and wiring
A consistent mapping between Arduino pins and physical components was defined so the software and hardware stayed aligned. The key mapping used is:
● Light 1 LED: digital pin 4 (with series resistor to protect LED).
● Light 2 LED: digital pin 7 (with series resistor).
● Alarm LED: digital pin 8.
● Buzzer: digital pin 9.
● Motor driver (L298N): IN1 pin 5, IN2 pin 6.
● Door servo signal: pin 11 (PWM-capable).
● HC-05 Bluetooth: RX on pin 2, TX on pin 3 via SoftwareSerial.
● I2C LCD: SDA to A4, SCL to A5, VCC to 5V, GND to GND.
For stable motor operation, the DC motor is powered from a separate supply connected to the L298N driver. The Arduino and motor supply grounds must be connected together to ensure a common reference. This prevents erratic behavior caused by floating grounds.
Result & Performance Evaluation
The integrated prototype achieves the primary goal: spoken commands are converted into correct physical actions on the model smart home, with real-time status feedback on the LCD. Evaluation was performed through staged testing and end-to-end trials.
Testing approach
Testing was performed in increasing levels of complexity to isolate faults quickly:
● Logic-only tests: run handle_command() with manual text strings to confirm device mapping and scene behavior.
● Serial communication tests: send known commands from Python to Arduino to confirm Bluetooth pairing, COM port selection, and command parsing.
● Actuator tests: test each output independently (light pins, motor control, servo angles, buzzer + LED alarm pattern).
● End-to-end tests: speak commands using the wake word and observe PC logs, actuator behavior, and LCD status updates.
RESULT AND PERFORMANCE EVALUATION
In typical quiet indoor conditions, the system responds consistently. The end-to-end response time is influenced mainly by speech-to-text processing because recognize_google() relies on network communication. Once a transcript is produced, command interpretation and Bluetooth transmission are near-instant from the user’s perspective. The Arduino executes each command immediately after the full line is received.
Recognition reliability decreases when there is strong background noise, multiple speakers, or unclear pronunciation. To improve robustness, the system includes microphone calibration at startup and a wake word requirement. Additionally, explicit fan handling checks for common misrecognitions such as "fun" instead of "fan".
Performance metrics (recommended)
If more formal evaluation is required, the following metrics can be recorded during trials: speech recognition success rate (percentage of spoken commands transcribed correctly), intent accuracy (percentage of correctly transcribed commands mapped to the correct device/action), end-to-end latency (time from end of speech to actuator response), and command reliability (percentage of commands received by Arduino without loss during rapid scene execution). These metrics can be collected by logging timestamps in Python and confirming Arduino reception via debug serial logs.
ADVANTAGES & LIMITATIONS
Advantages
The prototype offers several strong advantages for an internship deliverable and educational demonstration:
● Clear end-to-end pipeline: every step (voice, text, logic, hardware) is visible and modifiable.
● Low-cost and accessible parts: Arduino Uno, HC-05, LEDs, basic actuators, and common libraries.
● Hands-free interaction: voice commands mimic real consumer smart home use.
● Scene support: grouped automation (good night, leaving, alert) demonstrates realistic convenience features.
● Local feedback: LCD display confirms system state without needing a phone app.
Limitations
The simplified design introduces limitations that would need to be addressed for a real deployment:
● Cloud dependency: recognize_google() requires internet connectivity; offline operation is not supported in the current implementation.
● Security: Bluetooth serial communication is not authenticated or encrypted at the application level in this prototype.
● Scalability: the system controls a single Arduino node; larger homes typically require multiple nodes and a network protocol (MQTT/Zigbee).
● No device discovery: devices are hard-coded; commercial systems usually support dynamic discovery and configuration.
● Low-voltage model only: the prototype does not switch mains loads; safe relays, certified smart switches, and electrical standards would be required for real appliances.
● Limited language understanding: rule-based parsing works for predefined phrases but does not generalize as well as modern NLU models.
POTENTIAL IMPROVEMENTS
The following improvements can upgrade the prototype while keeping the architecture similar:
● Offline speech recognition: replace cloud recognition with an offline engine (for example, Vosk or Whisper running locally) to reduce latency and remove internet dependency.
● More flexible NLP: expand phrase matching, add synonym dictionaries, or integrate a lightweight intent classifier for better natural language coverage.
● User feedback loop: add voice responses (text-to-speech) so the assistant can confirm actions and request clarification when commands are ambiguous.
● Device configuration file: move device definitions, aliases, and pin mappings to a JSON/YAML configuration to allow quick customization without code changes.
● Reliability features: implement acknowledgements (ACK) from Arduino to Python so the PC can confirm successful command execution.
● Security: add pairing restrictions, session keys, or migrate to secure Wi-Fi communication for stronger protection.
FUTURE SCOPE
Beyond incremental improvements, the project has strong potential for expansion into a more realistic smart home platform. Possible future scope directions include:
● Move the host software to a Raspberry Pi to create a dedicated always-on home hub.
● Replace Bluetooth with Wi-Fi and MQTT to support multiple nodes and scalable device networks.
● Add sensors (motion, temperature, light level, door contact) to enable automatic triggers and smart rules (for example, lights on when motion detected at night).
● Implement schedules and routines (for example, fan on at specific times, lights off after midnight).
● Build a web dashboard to visualize device states, view logs, and trigger scenes from a phone or laptop.
● Integrate real smart relays/switches using safe electrical isolation and compliance with local standards.
CONCLUSION
This internship project successfully delivered a working voice-controlled smart home prototype that demonstrates how an AI-style voice interface can be integrated with embedded hardware to automate devices. The system uses Python for speech recognition and command interpretation, Bluetooth for wireless communication, and an Arduino Uno to control lights, a fan motor, a servo door, and an alarm, with state feedback shown on an I2C LCD.
The most valuable outcome of the project is the clear, educational structure: each layer is understandable and can be improved independently. While the prototype is intentionally simplified (rule-based NLP, cloud speech backend, minimal security), it provides a strong foundation for further development into a more scalable and secure system.
FINAL OUTPUT
At the end of the internship phase documented in this report, the following deliverables were completed:
● A functional Python voice assistant script (voice_smart_home.py) with wake word detection, device mapping, and scene automation.
● A working Bluetooth serial link to Arduino via an HC-05 module, using newline-terminated ASCII command messages.
● An Arduino firmware sketch that controls two lights, a DC motor fan via L298N, a servo door, and an alarm buzzer + LED pattern.
● A 16×2 I2C LCD interface that displays the current status of lights, fan, door, and alarm.
● Demonstrated end-to-end operation through spoken commands: turning devices on/off, opening/closing the door, and executing scenes (good night, leaving, alert).
The included source code in the appendices represents the complete implementation as of the current internship progress.
REFERENCES
[1] Arduino, “Arduino Uno Rev3,” Arduino Documentation. Available: https://www.arduino.cc/
[2] A. Zhang, “SpeechRecognition,” PyPI project page. Available:
https://pypi.org/project/SpeechRecognition/
[3] PySerial Contributors, “pySerial Documentation.” Available: https://pyserial.readthedocs.io/
[4] Arduino, “Servo Library,” Arduino Reference. Available:
https://www.arduino.cc/reference/en/libraries/servo/
[5] Arduino Community / Library Authors, “LiquidCrystal_I2C Library,” documentation varies by distribution
[6] STMicroelectronics, “L298 Dual Full-Bridge Driver,” datasheet
[7] HC-05 Bluetooth Module datasheet, various vendors
KHDA CERTIFICATE
