FAForever Forums
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Login

    Replay vault download

    Scheduled Pinned Locked Moved Contribution
    20 Posts 6 Posters 661 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N Offline
      Nooby
      last edited by

      How would one go about downloading the entire replay vault and replay vault database / search tools? What sort of size is it?

      It is very slow for me to navigate, and a local copy may be much faster

      BlackYpsB 1 Reply Last reply Reply Quote 0
      • Brutus5000B Offline
        Brutus5000 FAF Server Admin
        last edited by

        The whole set of all replays (which doesn't really make sense) takes around 500GB, but is not available as a single download.
        The search tools are not available for offline use either.

        "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
        – Benno Rice

        1 Reply Last reply Reply Quote 0
        • N Offline
          Nooby
          last edited by Nooby

          How are replays searched? Can metadata about the game be easily extracted? Is that metadata currently extracted and put into a database? I am intrested in improving this

          1 Reply Last reply Reply Quote 0
          • BlackYpsB Offline
            BlackYps @Nooby
            last edited by

            @nooby said in Replay vault download:

            It is very slow for me to navigate

            What does very slow mean and what are you trying to do? For the normal replay searches (searching for a player, searching for a map, etc.) it always felt fast enough for me.

            N 1 Reply Last reply Reply Quote 0
            • N Offline
              Nooby @BlackYps
              last edited by

              @blackyps What client are you using? For me I have not had that experience, with it hanging and needed a reFAF to fix. Searching for player, rating, map, number of players with advanced search options

              1 Reply Last reply Reply Quote 0
              • BlackYpsB Offline
                BlackYps
                last edited by

                Typically the latest client, but it shouldn't matter too much, because I don't think there were any client changes to the searches recently. Are you using Player Name "contains" instead of "is"? I just tested different stuff and it was all reasonably fast except that one. The database of all replays is really huge and checking for substrings in the players is really complicated. So if you know the player name you can use "is" and massively speed up the search. If you use the filter search, it already does this for you.
                If you absolutely have to run a complicated query you can limit the time range of the replays to speed it up. You are probably not that interested in games a long time ago anyway.

                N 1 Reply Last reply Reply Quote 0
                • N Offline
                  Nooby @BlackYps
                  last edited by

                  @blackyps Ah, I was using map contains and player contains so that must be why, need to optimsie the query.

                  I would still like a way to download every replay with:
                  say at least two players
                  at least one player over 1000 rating
                  no astro craters

                  for archival purposes.

                  1 Reply Last reply Reply Quote 0
                  • KaletheQuickK Offline
                    KaletheQuick
                    last edited by

                    Would it be possible to "cull" the replay vault? Replays are small, but I would wager games with one player in sandbox mode could stand to be removed. Perhaps a separate long term archive with things over a few years old would be helpful.

                    You must deceive the enemy, sometimes your allies, but you must always deceive yourself!

                    1 Reply Last reply Reply Quote 1
                    • Brutus5000B Offline
                      Brutus5000 FAF Server Admin
                      last edited by

                      We discussed that a few times. It's technically very challenging for no real benefit. So while it is technically possible, it won't happen.

                      "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                      – Benno Rice

                      1 Reply Last reply Reply Quote 0
                      • MazorNoobM Offline
                        MazorNoob
                        last edited by

                        I wouldn't call it very challenging, it's just some work. We have enough space on the disk to last us a few more years, so it's not an urgent issue.

                        Brutus5000B 1 Reply Last reply Reply Quote 0
                        • N Offline
                          Nooby
                          last edited by Nooby

                          https://replay.faforever.com/15487505

                          so, one could run an incremental wget script over a month or so against https://replay.faforever.com to download them all, rate limited to prevent ddos

                          header
                          {"uid": 15487505, "complete": true, "state": "PLAYING", "featured_mod": "faf", "game_type": "0", "recorder": "Kekomander", "host": "Kekomander", "launched_at": 1633885469.0, "game_end": 1633887433.0, "title": "1.3k pain", "mapname": "scmp_009", "num_players": 8, "teams": {"3": ["ZmeiGorinich", "Kekomander", "PhantomSamurai", "Greedyscoobs"], "2": ["JT_", "AlphaNoob", "DEVOTION", "Nooby"]}, "featured_mod_versions": {"1": 3724, "2": 3724, "3": 3634, "4": 3709, "5": 1, "6": 3724, "8": 1, "9": 1, "11": 3724, "12": 3724, "13": 3724, "14": 3724, "15": 3724, "17": 3677, "18": 3724, "19": 3724, "20": 3724, "21": 3724, "22": 3724}, "version": 2, "compression": "zstd"}

                          so, just parse the header line into your postresql and your good, you got your own offline searchable vault
                          how does it figure out the vistory condition?

                          Suggestions for improvement to search terms

                          search by elapsed game time
                          Serach by map size
                          have a saveable search profile for advances search

                          1 Reply Last reply Reply Quote 0
                          • AskaholicA Offline
                            Askaholic
                            last edited by

                            There are actually 2 different headers. The one added by FAF that you see there, and the original one included in the replay data. The original gpg header will have pretty much all metadata that you could want about the replay including game options, players, mods, etc. To get it though you’ll have to use a parser that knows the binary format of the header. I’ve written one and made a post about it here: https://forum.faforever.com/topic/1551/faf-scfa-replay-parser-library but there are a number of different implementations out there in a variety of languages.

                            N 1 Reply Last reply Reply Quote 1
                            • N Offline
                              Nooby @Askaholic
                              last edited by Nooby

                              @askaholic thank you, this is relevent and cool and something I am going to have a play with.

                              1 Reply Last reply Reply Quote 0
                              • Brutus5000B Offline
                                Brutus5000 FAF Server Admin @MazorNoob
                                last edited by

                                @mazornoob said in Replay vault download:

                                I wouldn't call it very challenging, it's just some work. We have enough space on the disk to last us a few more years, so it's not an urgent issue.

                                I do - in terms of FAF complexity that is what I call very challenging. It's not just moving around files. Just a few things that come up:

                                • Identifying proper criteria (it must catch enough replays to make a difference, but not remove any "important" cases). There are a few dozen different opinions and once you settle with a common understanding you've got to check that your database is holding that data consistently (which it usually doesn't) + filtering them in a way that it doesn't overload the server
                                • What do you do with table entries. You can't delete them, foreign key constraints don't allow that (review, moderation reports, ...). Maybe you can partition them, but again you're playing with fire on a live system in the 2 biggest tables you have. Then you need to make use of indices in the client so that the partitioning actually has an effect. Or you add some more flags or whatever
                                • In case you move them elsewhere make sure they are able for download there. Suddenly you need to resolve urls by business logic (right now it's just a redirect into the weird folder structure)
                                • How do you mount additional/external storage on the server

                                It's a problem that covers almost areas of FAF: Database, API, Client, Server structure. These are the worst.

                                "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                                – Benno Rice

                                1 Reply Last reply Reply Quote 1
                                • MazorNoobM Offline
                                  MazorNoob
                                  last edited by

                                  Picking good criteria is hard, sure, but for the database we have an 'is replay available' flag, don't we? We can just flip it and that's it, database entries can stay as they are. There's also no "move elsewhere" question if we choose to delete them 🙂

                                  1 Reply Last reply Reply Quote 0
                                  • Brutus5000B Offline
                                    Brutus5000 FAF Server Admin
                                    last edited by Brutus5000

                                    The point in discussion was "Perhaps a separate long term archive with things over a few years old would be helpful."
                                    Just dropping them was heavily opposed when discussed a few months ago.

                                    "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                                    – Benno Rice

                                    1 Reply Last reply Reply Quote 0
                                    • MazorNoobM Offline
                                      MazorNoob
                                      last edited by

                                      Alright, I misunderstood what the "that" was, sorry.

                                      1 Reply Last reply Reply Quote 0
                                      • N Offline
                                        Nooby
                                        last edited by Nooby

                                        the replay parser that @askaholic linked can extract a buinch of usefull information that could be used for filter querys to add to the dababase - for example if replay deynced, and if so what exact time.

                                        Also for moderation, automatically extracting that chat of every replay to another database

                                        1 Reply Last reply Reply Quote 0
                                        • AskaholicA Offline
                                          Askaholic
                                          last edited by

                                          It can also be done on demand with the OG web based parser https://fafafaf.github.io/. So for moderation purposes there isn’t a need to extract everything ahead of time. That would mostly be a ton of data that nobody looks at.

                                          1 Reply Last reply Reply Quote 0
                                          • N Offline
                                            Nooby
                                            last edited by

                                            I have a python3 wget script that grabs the direct link and saves the replays, incrementally. It is rate limited to stop ddos but if anyone would like it here it is.

                                            replaygrabber.zip

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post