thecrowler

Projects that follow the best practices below can voluntarily self-certify and show that they've achieved an Open Source Security Foundation (OpenSSF) best practices badge.

If this is your project, please show your badge status on your project page! The badge status looks like this: Badge level for project 8344 is passing Here is how to embed it:

These are the Passing level criteria. You can also view the Silver or Gold level criteria.

        

 Basics 13/13

  • Identification

    The CROWler is a specialized web crawler developed to efficiently navigate and index web pages. This tool leverages the robust capabilities of Selenium and Google Chrome (to covertly crawl a site), offering a reliable and precise crawling experience. It is designed with user customization in mind, allowing users to specify the scope and targets of their crawling tasks.

    What programming language(s) are used to implement the project?
  • Basic project website content


    The project website MUST succinctly describe what the software does (what problem does it solve?). [description_good]

    On the repository README.md we have:

    What is it? The CROWler is a specialized web crawler developed to efficiently navigate and index web pages. This tool leverages the robust capabilities of Selenium and Google Chrome (to covertly crawl a site), offering a reliable and precise crawling experience. It is designed with user customization in mind, allowing users to specify the scope and targets of their crawling tasks.

    To enhance its functionality, CROWler includes a suite of command-line utilities. These utilities facilitate seamless management of the crawler's database, enabling users to effortlessly add or remove websites from the Sources list. Additionally, the system is equipped with an API, providing a streamlined interface for database queries. This feature ensures easy integration and access to indexed data for various applications.

    Please check: https://github.com/pzaino/thecrowler#the-crowler



    The project website MUST provide information on how to: obtain, provide feedback (as bug reports or enhancements), and contribute to the software. [interact]

    On the contributing.md we have this section:

    Report bugs using Github's issues We use GitHub issues to track public bugs. Report a bug by opening a new issue; it's that easy!

    Please check: https://github.com/pzaino/thecrowler/blob/main/CONTRIBUTING.md



    The information on how to contribute MUST explain the contribution process (e.g., are pull requests used?) (URL required) [contribution]

    Non-trivial contribution file in repository: https://github.com/pzaino/thecrowler/blob/main/CONTRIBUTING.md.



    The information on how to contribute SHOULD include the requirements for acceptable contributions (e.g., a reference to any required coding standard). (URL required) [contribution_requirements]

    In the contributing.md file it's clearly explained that we use:

    Use a Consistent Coding Style 4 spaces for indentation rather than tabs You can try running gofmt for style unification

    Please check: https://github.com/pzaino/thecrowler/blob/main/CONTRIBUTING.md


  • FLOSS license

    What license(s) is the project released under?



    The software produced by the project MUST be released as FLOSS. [floss_license]

    The Apache-2.0 license is approved by the Open Source Initiative (OSI).



    It is SUGGESTED that any required license(s) for the software produced by the project be approved by the Open Source Initiative (OSI). [floss_license_osi]

    The Apache-2.0 license is approved by the Open Source Initiative (OSI).



    The project MUST post the license(s) of its results in a standard location in their source repository. (URL required) [license_location]

    Non-trivial license location file in repository: https://github.com/pzaino/thecrowler/blob/main/LICENSE.


  • Documentation


    The project MUST provide basic documentation for the software produced by the project. [documentation_basics]

    Some documentation basics file contents found.



    The project MUST provide reference documentation that describes the external interface (both input and output) of the software produced by the project. [documentation_interface]

    The project provides comprehensive documentation on how to install, build from source and use.

    The README.md provides insight on how to build and instal and configure. While the Doc section shows how to use it.

    Link to README.md: https://github.com/pzaino/thecrowler#the-crowler Link to full documentation: https://github.com/pzaino/thecrowler/tree/main/doc Specifically on how to use: https://github.com/pzaino/thecrowler/blob/main/doc/usage.md


  • Other


    The project sites (website, repository, and download URLs) MUST support HTTPS using TLS. [sites_https]

    Given only https: URLs.



    The project MUST have one or more mechanisms for discussion (including proposed changes and issues) that are searchable, allow messages and topics to be addressed by URL, enable new people to participate in some of the discussions, and do not require client-side installation of proprietary software. [discussion]

    GitHub supports discussions on issues and pull requests.



    The project SHOULD provide documentation in English and be able to accept bug reports and comments about code in English. [english]

    Documentation as linked above is ALL in English Language and reviewed to ensure clarity to the best of our abilities. Bug report is accepted through the usual github Issues.



    The project MUST be maintained. [maintained]


(Advanced) What other users have additional rights to edit this badge entry? Currently: []



  • Public version-controlled source repository


    The project MUST have a version-controlled source repository that is publicly readable and has a URL. [repo_public]

    Repository on GitHub, which provides public git repositories with URLs.



    The project's source repository MUST track what changes were made, who made the changes, and when the changes were made. [repo_track]

    Repository on GitHub, which uses git. git can track the changes, who made them, and when they were made.



    To enable collaborative review, the project's source repository MUST include interim versions for review between releases; it MUST NOT include only final releases. [repo_interim]

    The project includes interim releases (and also pre-releases for testing) as well as having an official Develop branch that is used to push code that needs also human testing, not just automated. On top of all this the repository has plenty of automated quality tests, the code comes with unit tests which are mandatory also for new features and has also automated coding style checks.



    It is SUGGESTED that common distributed version control software be used (e.g., git) for the project's source repository. [repo_distributed]

    Repository on GitHub, which uses git. git is distributed.


  • Unique version numbering


    The project results MUST have a unique version identifier for each release intended to be used by users. [version_unique]

    The project does have unique versions numbering and we version also RCs when and if available.



    It is SUGGESTED that the Semantic Versioning (SemVer) or Calendar Versioning (CalVer) version numbering format be used for releases. It is SUGGESTED that those who use CalVer include a micro level value. [version_semver]


    It is SUGGESTED that projects identify each release within their version control system. For example, it is SUGGESTED that those using git identify each release using git tags. [version_tags]

    Developers and contributors push their PRs against Develop branch, not Main (Main is protected and requires a PR from Develop with approval). Devs and contributors PRs against Develop are tested automatically and reviewed by humans before code gets merged into Develop (same happens for code that gets merged from Develop to Main). Devs and Contributors are also required to install pre-commit which is configured to run a lot of tests (and run also all unit tests) at every got commit. When code has passed ALL automated tests and human reviews and merged in Main, then an RC tag is emitted for testing on 3rd party systems. an RC has a period of a month, to give time to user to test. If no issues are reported, then the Main gets released again with the final release number

    Checks in pre-commit here: https://github.com/pzaino/thecrowler/blob/main/.pre-commit-config.yaml Tags here: https://github.com/pzaino/thecrowler/tags

    Project is new, so not official releases has been emitted yet.


  • Release notes


    The project MUST provide, in each release, release notes that are a human-readable summary of major changes in that release to help users determine if they should upgrade and what the upgrade impact will be. The release notes MUST NOT be the raw output of a version control log (e.g., the "git log" command results are not release notes). Projects whose results are not intended for reuse in multiple locations (such as the software for a single website or service) AND employ continuous delivery MAY select "N/A". (URL required) [release_notes]

    Each release (and pre-release) have a fully detailed release note: https://github.com/pzaino/thecrowler/releases/tag/v0.9.3

    Full changelog example: https://github.com/pzaino/thecrowler/compare/v0.9.2...v0.9.3



    The release notes MUST identify every publicly known run-time vulnerability fixed in this release that already had a CVE assignment or similar when the release was created. This criterion may be marked as not applicable (N/A) if users typically cannot practically update the software themselves (e.g., as is often true for kernel updates). This criterion applies only to the project results, not to its dependencies. If there are no release notes or there have been no publicly known vulnerabilities, choose N/A. [release_notes_vulns]

    The release note will indeed include all publicly known run-time vulnerabilities when we'll start releasing production versions.


  • Bug-reporting process


    The project MUST provide a process for users to submit bug reports (e.g., using an issue tracker or a mailing list). (URL required) [report_process]

    The project SHOULD use an issue tracker for tracking individual issues. [report_tracker]

    We use GitHub Issues number to track each individual issue



    The project MUST acknowledge a majority of bug reports submitted in the last 2-12 months (inclusive); the response need not include a fix. [report_responses]

    We aim to have ALL found issues public (we use GitHub for users to report them) and we acknowledge them all as soon as we are able to reproduce, so it could be even minutes.



    The project SHOULD respond to a majority (>50%) of enhancement requests in the last 2-12 months (inclusive). [enhancement_responses]

    We do our best to respond to enhancement requests as fast as possible, but the project if fully free, so involved people also need to ensure they do their jobs first to then have time for the project.



    The project MUST have a publicly available archive for reports and responses for later searching. (URL required) [report_archive]

    We use GitHub issues and Discussions, so everything is public. https://github.com/pzaino/thecrowler/issues?q=is%3Aissue+is%3Aclosed


  • Vulnerability report process


    The project MUST publish the process for reporting vulnerabilities on the project site. (URL required) [vulnerability_report_process]

    If private vulnerability reports are supported, the project MUST include how to send the information in a way that is kept private. (URL required) [vulnerability_report_private]

    Private vulnerability reports are supported and they can be submitted via email to the project author.



    The project's initial response time for any vulnerability report received in the last 6 months MUST be less than or equal to 14 days. [vulnerability_report_response]

    We check the project GitHub and the emails every day. So, every vulnerability found and reported will be checked certainly within 14 working days.


  • Working build system


    If the software produced by the project requires building for use, the project MUST provide a working build system that can automatically rebuild the software from source code. [build]

    The project provides a working build system to rebuild the entire set of micro-services at once:

    https://github.com/pzaino/thecrowler/blob/main/docker-rebuild.sh



    It is SUGGESTED that common tools be used for building the software. [build_common_tools]

    The project uses Docker and Docker compose to create containerized builds of all the required components and it also offers build scripts that add platform detection so to help Docker to pull or build containers appropriately.

    https://github.com/pzaino/thecrowler/blob/main/docker-compose.yml

    https://github.com/pzaino/thecrowler/blob/main/docker-build.sh



    The project SHOULD be buildable using only FLOSS tools. [build_floss_tools]

    The project IS buildable using ONLY FLOSS tools, to build it one needs go lang only. Database is based on PostgreSQL and SQLite, components are packaged in Docker images at build time and using freely available tools. It works fine with Docker provided with Linux Distribution (aka doesn't requires to install docker from docker.io)


  • Automated test suite


    The project MUST use at least one automated test suite that is publicly released as FLOSS (this test suite may be maintained as a separate FLOSS project). The project MUST clearly show or document how to run the test suite(s) (e.g., via a continuous integration (CI) script or via documentation in files such as BUILD.md, README.md, or CONTRIBUTING.md). [test]

    A test suite SHOULD be invocable in a standard way for that language. [test_invocation]

    A user can run go test to run unit tests locally on their system and at any time.



    It is SUGGESTED that the test suite cover most (or ideally all) the code branches, input fields, and functionality. [test_most]

    The entire suite of tests covers everything. And we are also adding system tests to measure performance under stress.



    It is SUGGESTED that the project implement continuous integration (where new or changed code is frequently integrated into a central code repository and automated tests are run on the result). [test_continuous_integration]

    The project allows to do continuous integration. Contains can be built off-line and replace existing containers too.


  • New functionality testing


    The project MUST have a general policy (formal or not) that as major new functionality is added to the software produced by the project, tests of that functionality should be added to an automated test suite. [test_policy]

    From the CONTRIBUTING.md file:

    "If you've added code that should be tested, add tests."

    link here: https://github.com/pzaino/thecrowler/blob/main/CONTRIBUTING.md



    The project MUST have evidence that the test_policy for adding tests has been adhered to in the most recent major changes to the software produced by the project. [tests_are_added]

    It is SUGGESTED that this policy on adding tests (see test_policy) be documented in the instructions for change proposals. [tests_documented_added]

    It is:

    "If you've added code that should be tested, add tests. For more information on testing, see Test Policy for TheCROWler."

    From the CONTRIBUTING.MD file, link here: https://github.com/pzaino/thecrowler/blob/main/CONTRIBUTING.md


  • Warning flags


    The project MUST enable one or more compiler warning flags, a "safe" language mode, or use a separate "linter" tool to look for code quality errors or common simple mistakes, if there is at least one FLOSS tool that can implement this criterion in the selected language. [warnings]

    Project is written in go (and gofmt) and pre-commit forbid a developer from even being able to push their code in a develop branch. tests are executed even when code compiles fine and we have all go lang warning.



    The project MUST address warnings. [warnings_fixed]

    As mentioned, we do address warnings.



    It is SUGGESTED that projects be maximally strict with warnings in the software produced by the project, where practical. [warnings_strict]

    We use Go lang and gofmt as well as: - go-fmt - no-go-testing - golangci-lint - go-unit-tests

    https://github.com/pzaino/thecrowler/blob/main/.pre-commit-config.yaml


  • Secure development knowledge


    The project MUST have at least one primary developer who knows how to design secure software. (See ‘details’ for the exact requirements.) [know_secure_design]

    We do have a primary developer with multiple courses and more than 30 years of experience in coding and secure coding.



    At least one of the project's primary developers MUST know of common kinds of errors that lead to vulnerabilities in this kind of software, as well as at least one method to counter or mitigate each of them. [know_common_errors]

    Our primary developer works in the Cyber Security field and has experience in developing and designing IDS, IPS, Firewalls and Anitvirus software and deals daily with vulnerabilities and CVEs, CWEs and CPEs


  • Use basic good cryptographic practices

    Note that some software does not need to use cryptographic mechanisms. If your project produces software that (1) includes, activates, or enables encryption functionality, and (2) might be released from the United States (US) to outside the US or to a non-US-citizen, you may be legally required to take a few extra steps. Typically this just involves sending an email. For more information, see the encryption section of Understanding Open Source Technology & US Export Controls.

    The software produced by the project MUST use, by default, only cryptographic protocols and algorithms that are publicly published and reviewed by experts (if cryptographic protocols and algorithms are used). [crypto_published]


    If the software produced by the project is an application or library, and its primary purpose is not to implement cryptography, then it SHOULD only call on software specifically designed to implement cryptographic functions; it SHOULD NOT re-implement its own. [crypto_call]

    Software uses standard HTTPS for the few cryptographic elements and for that uses the standard go lang libraries.



    All functionality in the software produced by the project that depends on cryptography MUST be implementable using FLOSS. [crypto_floss]


    The security mechanisms within the software produced by the project MUST use default keylengths that at least meet the NIST minimum requirements through the year 2030 (as stated in 2012). It MUST be possible to configure the software so that smaller keylengths are completely disabled. [crypto_keylength]


    The default security mechanisms within the software produced by the project MUST NOT depend on broken cryptographic algorithms (e.g., MD4, MD5, single DES, RC4, Dual_EC_DRBG), or use cipher modes that are inappropriate to the context, unless they are necessary to implement an interoperable protocol (where the protocol implemented is the most recent version of that standard broadly supported by the network ecosystem, that ecosystem requires the use of such an algorithm or mode, and that ecosystem does not offer any more secure alternative). The documentation MUST describe any relevant security risks and any known mitigations if these broken algorithms or modes are necessary for an interoperable protocol. [crypto_working]


    The default security mechanisms within the software produced by the project SHOULD NOT depend on cryptographic algorithms or modes with known serious weaknesses (e.g., the SHA-1 cryptographic hash algorithm or the CBC mode in SSH). [crypto_weaknesses]


    The security mechanisms within the software produced by the project SHOULD implement perfect forward secrecy for key agreement protocols so a session key derived from a set of long-term keys cannot be compromised if one of the long-term keys is compromised in the future. [crypto_pfs]


    If the software produced by the project causes the storing of passwords for authentication of external users, the passwords MUST be stored as iterated hashes with a per-user salt by using a key stretching (iterated) algorithm (e.g., Argon2id, Bcrypt, Scrypt, or PBKDF2). See also OWASP Password Storage Cheat Sheet. [crypto_password_storage]


    The security mechanisms within the software produced by the project MUST generate all cryptographic keys and nonces using a cryptographically secure random number generator, and MUST NOT do so using generators that are cryptographically insecure. [crypto_random]

  • Secured delivery against man-in-the-middle (MITM) attacks


    The project MUST use a delivery mechanism that counters MITM attacks. Using https or ssh+scp is acceptable. [delivery_mitm]


    A cryptographic hash (e.g., a sha1sum) MUST NOT be retrieved over http and used without checking for a cryptographic signature. [delivery_unsigned]

  • Publicly known vulnerabilities fixed


    There MUST be no unpatched vulnerabilities of medium or higher severity that have been publicly known for more than 60 days. [vulnerabilities_fixed_60_days]

    We also use a vulnerability bot that automatically creates pull-requests with updated dependencies when one is found having a vulnerability.



    Projects SHOULD fix all critical vulnerabilities rapidly after they are reported. [vulnerabilities_critical_fixed]

  • Other security issues


    The public repositories MUST NOT leak a valid private credential (e.g., a working password or private key) that is intended to limit public access. [no_leaked_credentials]

    we check for private credentials from the developer local system using pre-commit and re-check on GitHub as well


  • Static code analysis


    At least one static code analysis tool (beyond compiler warnings and "safe" language modes) MUST be applied to any proposed major production release of the software before its release, if there is at least one FLOSS tool that implements this criterion in the selected language. [static_analysis]

    SonarQube (locally), CodeQL, Codacy on GitHub.com



    It is SUGGESTED that at least one of the static analysis tools used for the static_analysis criterion include rules or approaches to look for common vulnerabilities in the analyzed language or environment. [static_analysis_common_vulnerabilities]


    All medium and higher severity exploitable vulnerabilities discovered with static code analysis MUST be fixed in a timely way after they are confirmed. [static_analysis_fixed]


    It is SUGGESTED that static source code analysis occur on every commit or at least daily. [static_analysis_often]

    We also use SonarCube LINT in VSCode studio so security issues are detected as we write code (real time), we check again with sonar-scanner before commit and with CodeQL and Codacy at every commit on gitHub


  • Dynamic code analysis


    It is SUGGESTED that at least one dynamic analysis tool be applied to any proposed major production release of the software before its release. [dynamic_analysis]


    It is SUGGESTED that if the software produced by the project includes software written using a memory-unsafe language (e.g., C or C++), then at least one dynamic tool (e.g., a fuzzer or web application scanner) be routinely used in combination with a mechanism to detect memory safety problems such as buffer overwrites. If the project does not produce software written in a memory-unsafe language, choose "not applicable" (N/A). [dynamic_analysis_unsafe]

    project is fully written in Go Lang, so no memory-unsafe code.



    It is SUGGESTED that the project use a configuration for at least some dynamic analysis (such as testing or fuzzing) which enables many assertions. In many cases these assertions should not be enabled in production builds. [dynamic_analysis_enable_assertions]


    All medium and higher severity exploitable vulnerabilities discovered with dynamic code analysis MUST be fixed in a timely way after they are confirmed. [dynamic_analysis_fixed]

    Not only we do, we also generate our containers hardened and code is executed with low privilege accounts inside the containers. Where applicable containers have also read only filesystem enabled.



This data is available under the Creative Commons Attribution version 3.0 or later license (CC-BY-3.0+). All are free to share and adapt the data, but must give appropriate credit. Please credit Paolo Fabio Zaino and the OpenSSF Best Practices badge contributors.

Project badge entry owned by: Paolo Fabio Zaino.
Entry created on 2024-01-25 23:46:04 UTC, last updated on 2024-06-04 01:29:01 UTC. Last achieved passing badge on 2024-06-04 01:29:01 UTC.

Back