Senior Site Reliability Engineer

Tokyo
Partial Remote
Full-time
October 4, 2024

【仕事内容】

As a senior Site Reliability Engineer, you will be responsible for developing solutions, implementing requirements, assisting in creating key processes and procedures, that facilitate product planning, execution and delivery. We aim to solve society's issues with AI, so our mission is to solve the Engineering Department's issues!

Lead the design, implementation, and management of scalable and reliable infrastructure solutions in public cloud environments (e.g., AWS).
Lead the development and maintenance of Kubernetes clusters, ensuring optimal performance, availability, and security.
Collaborate with development teams to provide expertise in designing architecture, act as a trusted advisor for development teams, provide consultations on infrastructure-related matters and guide them toward effective and scalable solutions.
Monitor system performance, troubleshoot complex issues, and implement proactive measures to ensure high availability and reliability.
Lead incident response and resolution, conducting post-mortem analyses to identify areas for improvement.
Lead the professional development initiatives within the team by mentoring junior members, conducting comprehensive code reviews to uphold quality and best practices, and orchestrating training and workshops to enhance overall skill sets.

シニアSREとして、製品の企画、実行、およびデリバリーを円滑にするための主要なプロセスと手順の開発、要件の実装に責任を持ちます。AIを使用して社会の課題を解決することを目指しているため、エンジニアリング部門の問題を解決する使命を担います！

AWSなどのパブリッククラウド環境でスケーラブルで信頼性のあるインフラソリューションの設計、実装、および管理をリードします。
Kubernetesクラスターの開発とメンテナンスをリードし、最適なパフォーマンス、可用性、およびセキュリティを確保します。
開発チームと協力してアーキテクチャの設計に専門知識を提供し、開発チームに対して信頼できるアドバイザーとして機能し、インフラに関連する問題に対するコンサルテーションを行い、効果的かつスケーラブルなソリューションに導きます。
システムのパフォーマンスを監視し、複雑な問題のトラブルシューティングを行い、高い可用性と信頼性を確保するための積極的な対策を実施します。
インシデントの対応と解決をリードし、事後分析を実施して改善の余地を特定します。
チーム内のプロフェッショナルな成長イニシアチブをリードし、ジュニアメンバーへのメンタリング、コードの総合的なレビューによる品質とベストプラクティスの維持、全体的なスキルセットの向上を図るためのトレーニングとワークショップを主催します。

【必須要件】

Extensive expertise in at least one cloud platform (i.e. AWS, Azure, GCP, etc...) and experience in designing and leading the management of scalable cloud-based infrastructure
Strong expertise in infrastructure-as-code solutions such as Terraform
Strong operational expertise in containerization technologies, especially Kubernetes
In-depth knowledge of source control, CI/CD, infrastructure automation, orchestration, deployment automation and configuration management
Solid understanding of networking and security best practices
Excellent problem-solving skills and the ability to lead collaboratively in a team-oriented environment.
While our team is mostly English-speaking, you should be comfortable enough talking in Japanese with other internal stakeholder

少なくとも1つのクラウドプラットフォーム（例：AWS、Azure、GCPなど）における幅広い専門知識と、スケーラブルなクラウドベースのインフラストラクチャの設計および管理のリーダーシップ経験
Terraformなどのインフラストラクチャのコード化ソリューションにおける強力な専門知識
特にKubernetesにおける強力な運用の専門知識
ソースコントロール、CI/CD、インフラストラクチャの自動化、オーケストレーション、デプロイメントの自動化、および構成管理に関する深い知識
ネットワーキングおよびセキュリティのベストプラクティスに対する確かな理解
優れた問題解決能力およびチーム指向の環境で協力的にリーダーシップを発揮できる能力
チーム内言語は主に英語となりますが、社内関係部門と日本語でコミュニケーションをとることが求められます。

【歓迎要件】

AWS Solutions Architect certifications or knowledge on par with those
Certified Kubernetes Administrator or knowledge on par with those
Familiar with scripting languages (Shell, Python, Golang)
Familiar with extended infrastructure-related tooling such as Ansible or Chef
Experience in working with large software systems developed on Unix/Linux
Experience of working with monitoring and metrics systems (e.g Grafana, Datadog, etc.)
Experience in leading teams through incident response and post-mortem analysis
Experience in working closely together with development, product and business teams
Bi-lingual (business English level& Japanese daily conversation level or English daily conversation level & Japanese native level)

AWS Solutions Architectの認定資格または同等の知識
Certified Kubernetes Administratorの認定資格また同等の知識
Shell, Python, Golang などのプログラミング言語の経験
Ansible, Chefなどの拡張インフラ関連ツールの経験
Unix/Linux上で開発された大規模なソフトウェアシステムの運用経験
モニタリングおよびメトリクスシステム（例：Grafana、Datadogなど）の管理運用経験
インシデント対応と事後分析を通じてチームをリードした経験
Engineering, Product, Businessチームなどと連携しながら業務を進めた経験
バイリンガル（ビジネス英語レベル＆日本語の日常会話レベルまたは英語の日常会話レベル＆日本語のネイティブレベル）

【求める人物像】

You are comfortable at explaining complex recommendations to engineering and infrastructure teams, while discussing technical trade-offs in product development with other work colleagues.
You are highly resourceful, analytical, and have a combination of focus, flexibility, self-motivation, and integrity.
Our team values communication with candor (openness, frankness, honesty) and the 4 Agile Values to ensure everybody can grow and progress together as well as support the company's CREDO and values, and you are comfortable to work in such an environment.

Engineering, Infrastructureチームなどとプロダクト開発における技術的課題を議論しながら、複雑な提案を分かりやすく説明できる方
柔軟性, 論理的思考, 自発性, 誠実さを持っている方
私たちのチームは、会社の理念やバリューに貢献しながら、チーム全員が切磋琢磨し成長していくため、オープンで率直なコミュケーションとアジャイルの価値観を重視しています。とても働きやすい環境です。

APPLY NOW ➜

About Exawizards

「AIを用いた社会課題解決を通じて、幸せな社会を実現する」をミッションに、介護、人材、金融、医療、製造、流通など複数領域に横断して、複合的に社会課題の本質的解決を目指すAIスタートアップです。ビジネスモデルとしては主にプロジェクト型とプロダクト型に分かれ、プロジェクト型では機械学習エンジニアとコンサルタントがビジネス課題の発見から学習モデルによる解決まで、プロダクト型では発見した課題とそれを解決する学習モデルを用いたより普遍的で広範なソリューション提供のためのSaaSプロダクトを開発しています。

Japanese Language Policy
Japanese skills are not strictly required. They also offer some support for helping with Japanese language study, but those who don’t speak Japanese may have a more limited set of teams where they can work. They recommend employees have the ability to hold daily conversations in Japanese.
Paid Leave
At the start of the fiscal year (April 1st), employees receive their paid time off days for the year.

In the first year of employment, the number of days received will be relative to the number of months worked.

In the year after joining, employees receive 15 days of paid time off, followed by 16 days the next year, and an additional two days beyond that for each year that passes.

Also, members from outside of Japan get two extra “return vacation” days in addition to this.
Benefits
💹 Stock options

👩‍🎓 Support for PhD studies

💜 Support for attending conferences, purchasing books etc.

👨‍🏫 Japanese lessons

🌎 Translation/interpretation support (EN<>JP)
(Simultaneous interpretation during all-hands meetings, announcements in English)
Engineer Interview Process
Software Engineers & Mobile Engineers

1.Online Coding Test (30min)
2.1st Technical Interview (White-Board Coding)
3.2nd Technical Interview (Overall Experience, etc.)
4.Final interview
+Depending on the individual case we may ask the candidate to attend a team matching interview.

Platform Engineers

1.Online Coding Test (30min)
2.1st Technical Interview (Team interview)
3.2nd Technical Interview (Leader Interview)
4.Final Interview
+In case of some Platform Engineer positions (ML Infrastructure, etc.), the coding test step may be skipped.
Remote Work
Partial Remote Work

ExaWizards offers partial remote work, and do not currently implement full remote work.

All employees are required to come to the Tokyo head office once a week. There is a possibility that the frequency of office attendance may increase in the future.
Work style
For engineers and designers, ExaWizards uses a fully discretionary labor system (裁量労働制).
They don’t have a specific start time, nor do they have core hours.

Members with children are able to use this system to make time to drop off children at school or fulfill any other obligations.
Doctorate Support Program

ExaWizards has a special support program for people who want to continue their doctoral studies while working. They even subsidize the costs (tuition, entrance fees etc) up to 2 million yen until graduation.

Care Tech

We develop evidence-based care to support the independence and coexistence of care recipients and make the care more accessible by using technology.

Med Tech

We aim to develop value-based healthcare for every individual and create a world in which everyone can live the best possible life they can.

We work on the early detection of individual disease risks and prevention of aggravation using ICT/AI.

Fin Tech

We make financial services for people who have been left behind because of the digital divide.
1. Support for consumption activities of the elderly
2. Support for continuation and growth of SMEs
3. Development of more convenient financial services
HR Tech

Through Jinjirality, technical singularity in the HR industry (“Jinji” in Japanese), we will pass on a better society to future generations despite the decrease in the working-age population by half.

※Business units are generally determined by individual choice and you can move between different domains